For hundreds of years, automation has helped people save time and focus on higher-level tasks. Henry Ford's moving assembly line reduced the time to manufacture a Model T from 12 hours to just 1.5 hours, and Richard Morley's programmable logic controller helped businesses save billions of hours globally by democratizing automation.
But, what if you could build an automation that itself creates automations without human intervention?
That's what we did with our spreadsheet agent. Instead of leaving an open prompt to the user, we built a second agent that generates prompts and gathers required files by itself, all by just watching a screen recording of a person working through a spreadsheet. The next big leap for automation isn't faster execution; it's automation that designs automation.
In this article, we'll go through how we combined two agents to automate automation:
We'll explore how this second-order automation approach fundamentally changes the scalability of AI-powered workflows.
First-order automations, the ones most teams build today, are incredibly reliant on human-defined work. In the case of our spreadsheet agent, accountants would be required to stop their work and spend hours providing the correct context and specific prompts to the agent.
This creates two major bottlenecks: first, the automation is only as good as the user prompting the agent. Any missing context or poor instructions can throw off the result or cause the agent to fail. Second, every new task requires fresh human input.
While the actual execution flow is automated, workflow creation itself remains a tedious and repetitive task for users, limiting scalability and the time-saving impact of the agent.
We realized that the real time savings here weren't from making the spreadsheet agent marginally better (we have already seen good performance, more details below); it would come from removing the need to write instructions at all.
Instead of relying on humans to build prompts and provide context, we introduced a second agent that watched a screen recording of a user working through a spreadsheet process. Then, leveraging a large context multimodal model, we created a detailed set of instructions and determined what additional context needed to be provided, all from the context-rich screen recording of the user working through the process itself. This instruction set and correct user-uploaded context was then passed to our spreadsheet agent to complete the task.
By pairing these two agents up, an "architect" and a "doer," we shifted automation away from creating single tasks to creating entire workflows from scratch. This means our system could take a human-recorded workflow and, after watching it once, create an entirely reusable workflow without additional input.
The spreadsheet agent is the execution engine, the "doer" that runs the workflow — the part of the system that takes instructions and executes them flawlessly. Once the spreadsheet agent is handed a detailed prompt and context, it uses a wide host of tools to search over the workbook and execute all instructions with high accuracy.
We chose to build a spreadsheet agent because we see spreadsheet automation as a "low-hanging fruit" in many accounting jobs and workflows. As an example to better illustrate this, assume we have a process to carve out a statue of my manager, Alex:
We have five really different things to automate here. The first three steps require cutting tools, heavy machinery, and transportation equipment. Step four requires careful, meticulous tool handling, and step five requires packaging equipment and more heavy machinery to ship the statue away. Where do we start?
How I look thinking about where to start with automation
To systematically evaluate which tasks offer the best automation opportunities, we need to consider multiple factors: time to complete, cost to automate, and the net benefit of automation. Let's analyze our statue carving process:
Task | Time to Complete | Cost to Automate | Net Outcome |
---|---|---|---|
The analysis clearly reveals our optimization target. While the first few steps involve significant overhead costs for machinery and setup, the carving step offers the best return on investment — high time savings with moderate automation complexity.
On its own, the spreadsheet agent we created is incredibly powerful, but requires a very detailed prompt and just the right context inputted for every run. This alone takes lots of time, trial and error, and token cost.
The architect agent solves this "blueprint" problem. Its role is to watch a human work through a spreadsheet via screen recordings and break the process into clear, specific instructions for our spreadsheet agent to use in the future. The architect agent also notes when users paste in data from external sources, or understands if other context is required (e.g., the current month) before running the agent.
All of the architect agent's outputs are compiled into a Process: a compiled set of steps, required files, and additional information that will eventually be passed along to the spreadsheet agent for each run.
Both agent flows are provided in the first image, and the second image dives deeper into how the spreadsheet agent is architected.
The high-level spreadsheet automation tool architecture. Left: process map for how processes are generated based on an uploaded video. Right: process map for running processes at runtime.
Spreadsheet agent architecture. The agent is simple and relies as much as possible on model intelligence to achieve tasks with minimal guide rails.
To test our agent, we ran it against SpreadsheetBench, a popular benchmark for gauging spreadsheet agent performance. In this case, we tested on a subset of approximately 50 randomly chosen tasks (each of which has 3 test cases). The results demonstrate strong performance across different categories of spreadsheet tasks.
For tasks with soft restrictions (where some flexibility in approach is acceptable), our agent achieved 49.5% accuracy, compared to OpenAI's 32.4%. On hard restriction tasks (requiring strict adherence to specific methods), our agent achieved 32.5% accuracy, significantly outperforming GPT-4o's 13.38%.
SpreadsheetBench results on ~50 tasks (3 cases each). OpenAI data from "Introducing ChatGPT agent"
Time savings using the Spreadsheet Agent are substantial. Spreadsheet tasks that took 1–2 hours to complete are now finished in under 10 minutes by the agent, representing a 6- to 12-fold increase in completion speed for some tasks. In other tasks, the agent was able to partially complete the task, still saving time, but requiring third-party intervention before completion.
One of the most exciting results of this project is that the barrier to creating new automations has been dramatically reduced. What used to require weeks of development work by engineering teams can now be accomplished by any accountant with a screen recording and 5 minutes to answer clarifying questions.
While we haven't yet developed a formal benchmark for the architect agent, we can see its impact in terms of workflow creation from non-technical users. Accountants simply upload a few files, press start, and come back to a ready-to-use process. The time savings are significant when compared with manual prompting for every trial.
Our combined agent doesn't just do work, it designs how work gets done. Automating automation allowed us to unlock a new layer of scalability and speed, empowering people across the company to build and run their own workflows with ease.