Automation, Squared

–

August 12, 2025

For hundreds of years, automation has helped people save time and focus on higher-level tasks. Henry Ford's moving assembly line reduced the time to manufacture a Model T from 12 hours to just 1.5 hours, and Richard Morley's programmable logic controller helped businesses save billions of hours globally by democratizing automation.

But, what if you could build an automation that itself creates automations without human intervention?

That's what we did with our spreadsheet agent. Instead of leaving an open prompt to the user, we built a second agent that generates prompts and gathers required files by itself, all by just watching a screen recording of a person working through a spreadsheet. The next big leap for automation isn't faster execution; it's automation that designs automation.

In this article, we'll go through how we combined two agents to automate automation:

the "architect" agent to construct prompts and context requirements
the "doer" agent to execute instructions with high rates of success

We'll explore how this second-order automation approach fundamentally changes the scalability of AI-powered workflows.

The Problem with First-Order Automation

First-order automations, the ones most teams build today, are incredibly reliant on human-defined work. In the case of our spreadsheet agent, accountants would be required to stop their work and spend hours providing the correct context and specific prompts to the agent.

This creates two major bottlenecks: first, the automation is only as good as the user prompting the agent. Any missing context or poor instructions can throw off the result or cause the agent to fail. Second, every new task requires fresh human input.

While the actual execution flow is automated, workflow creation itself remains a tedious and repetitive task for users, limiting scalability and the time-saving impact of the agent.

The Leap to Second-Order Automation

We realized that the real time savings here weren't from making the spreadsheet agent marginally better (we have already seen good performance, more details below); it would come from removing the need to write instructions at all.

Instead of relying on humans to build prompts and provide context, we introduced a second agent that watched a screen recording of a user working through a spreadsheet process. Then, leveraging a large context multimodal model, we created a detailed set of instructions and determined what additional context needed to be provided, all from the context-rich screen recording of the user working through the process itself. This instruction set and correct user-uploaded context was then passed to our spreadsheet agent to complete the task.

By pairing these two agents up, an "architect" and a "doer," we shifted automation away from creating single tasks to creating entire workflows from scratch. This means our system could take a human-recorded workflow and, after watching it once, create an entirely reusable workflow without additional input.

The Doer: Spreadsheet Agent

The spreadsheet agent is the execution engine, the "doer" that runs the workflow — the part of the system that takes instructions and executes them flawlessly. Once the spreadsheet agent is handed a detailed prompt and context, it uses a wide host of tools to search over the workbook and execute all instructions with high accuracy.

Tangent on Why Spreadsheets

We chose to build a spreadsheet agent because we see spreadsheet automation as a "low-hanging fruit" in many accounting jobs and workflows. As an example to better illustrate this, assume we have a process to carve out a statue of my manager, Alex:

Identify a tree in a forest suitable for the job (wood type, size, etc. should all check out here)
Cut the tree down and haul it back to the carving studio
Prepare the tree for carving
Carve the prepared tree into a statue of Alex
Ship the finished wooden statue to the customer

We have five really different things to automate here. The first three steps require cutting tools, heavy machinery, and transportation equipment. Step four requires careful, meticulous tool handling, and step five requires packaging equipment and more heavy machinery to ship the statue away. Where do we start?

How I look thinking about where to start with automation

To systematically evaluate which tasks offer the best automation opportunities, we need to consider multiple factors: time to complete, cost to automate, and the net benefit of automation. Let's analyze our statue carving process:

Task	Time to Complete	Cost to Automate	Net Outcome

The analysis clearly reveals our optimization target. While the first few steps involve significant overhead costs for machinery and setup, the carving step offers the best return on investment — high time savings with moderate automation complexity.

The Doer Alone Is Not Great

On its own, the spreadsheet agent we created is incredibly powerful, but requires a very detailed prompt and just the right context inputted for every run. This alone takes lots of time, trial and error, and token cost.

The Architect: Prompt Generation & Context Identifier Agent

The architect agent solves this "blueprint" problem. Its role is to watch a human work through a spreadsheet via screen recordings and break the process into clear, specific instructions for our spreadsheet agent to use in the future. The architect agent also notes when users paste in data from external sources, or understands if other context is required (e.g., the current month) before running the agent.

All of the architect agent's outputs are compiled into a Process: a compiled set of steps, required files, and additional information that will eventually be passed along to the spreadsheet agent for each run.

Architecture Diagrams

Both agent flows are provided in the first image, and the second image dives deeper into how the spreadsheet agent is architected.

The high-level spreadsheet automation tool architecture. Left: process map for how processes are generated based on an uploaded video. Right: process map for running processes at runtime.

Spreadsheet agent architecture. The agent is simple and relies as much as possible on model intelligence to achieve tasks with minimal guide rails.

Agent Performance

Spreadsheet Agent

To test our agent, we ran it against SpreadsheetBench, a popular benchmark for gauging spreadsheet agent performance. In this case, we tested on a subset of approximately 50 randomly chosen tasks (each of which has 3 test cases). The results demonstrate strong performance across different categories of spreadsheet tasks.

For tasks with soft restrictions (where some flexibility in approach is acceptable), our agent achieved 49.5% accuracy, compared to OpenAI's 45.5%. On hard restriction tasks (requiring strict adherence to specific methods), our agent achieved 32.5% accuracy, significantly outperforming GPT-4o's 13.4%.

SpreadsheetBench results on ~50 tasks (3 cases each). OpenAI data from "Introducing ChatGPT agent"

Time savings using the Spreadsheet Agent are substantial. Spreadsheet tasks that took 1–2 hours to complete are now finished in under 10 minutes by the agent, representing a 6- to 12-fold increase in completion speed for some tasks. In other tasks, the agent was able to partially complete the task, still saving time, but requiring third-party intervention before completion.

One of the most exciting results of this project is that the barrier to creating new automations has been dramatically reduced. What used to require weeks of development work by engineering teams can now be accomplished by any accountant with a screen recording and 5 minutes to answer clarifying questions.

Architect Agent

While we haven't yet developed a formal benchmark for the architect agent, we can see its impact in terms of workflow creation from non-technical users. Accountants simply upload a few files, press start, and come back to a ready-to-use process. The time savings are significant when compared with manual prompting for every trial.

Lessons Learned

Second-order changes the scaling equation. First-order automation scales linearly with how many users can write prompts and add context. Second-order automation flips this idea on its head: once you have an architect agent, adding new workflows costs next to zero in human time. That's not just a productivity shift, but a shift in how we think about automation strategy.
Models have complementary strengths. We use two very different types of models for different areas of the same automation. By pairing them correctly, we get the best of both types of models and maximize productivity gain.
Building good agents is hard. We spent lots of time iterating on the architecture diagrams and system prompts to get both agents to produce useful content. It's a process that takes time, patience, trial and error, and lots of strong engineering elbow grease to get right.

Conclusions

Our combined agent doesn't just do work, it designs how work gets done. Automating automation allowed us to unlock a new layer of scalability and speed, empowering people across the company to build and run their own workflows with ease.

Like what you see? Join us →

Come build the future of finance automation that increases the lifespan of businesses.

The Ramp Visa Commercial Card and the Ramp Visa Corporate Card are issued by Sutton Bank and Celtic Bank (Members FDIC), respectively. Please visit our Terms of Service for more details.