AI·29 May · 2026·4 min read

The orchestration tax.

We spent a month running the workbench loud, a dozen agents going at once, and the dashboard looked incredible. Almost none of it shipped. Here is what that taught us about the one resource on the bench that refuses to parallelize.

Note №122026.05

For a stretch this spring we ran the workbench loud: a dozen, sometimes twenty agents going at once, each chewing on its own branch. It felt like the most productive month of our lives. The board was full, every tile moving, something always finishing. Then we looked at what had actually landed on main, and it was a fraction of the motion. The gap between the two had a shape, and once we saw it we could not unsee it. Starting an agent costs a sentence. Finishing one costs all of our attention, and there is only one of us holding it at a time.

That gap is an asymmetry we kept failing to price in. Opening an agent is a keystroke. Closing the loop on one means reading what came back, deciding whether it is right, and reconciling it against whatever the other agents changed underneath it. Every one of those steps routes through a single head. For a while we thought of the agents as the system and ourselves as the operator standing outside it. The truer picture is that we are a component inside the system, and the slowest one. Anyone who has written concurrent code already has the intuition. Python lets you spawn all the threads you want and still runs only one at a time, because they all queue for a single lock. On our bench that lock is human judgement, and we hold the only copy.

Amdahl's Law put a number on the disappointment. However much of the pipeline you parallelize, the slice that stays serial caps the whole thing. For us the serial slice is the thinking: understanding the architecture, catching the answer that is subtly wrong, resolving the merge. Adding agents did not speed that up by a second. It just made the queue in front of it longer. We had optimised the one stage that was never the constraint, and the unread work stacked up exactly where we had no way to add capacity.

It also explained why we were so tired. Running a single processor flat out with no slack feels precisely like that. Every time we jumped back to an agent we had left, we paid to reload a context that had gone cold, and we never reloaded it perfectly. Five agents was not one workload done five times. It was five cold restarts plus a low background hum of worry about which one we should be checking. Pushing harder bought us no extra capacity. It bought shallower reviews, and eventually a quiet sort of surrender where we started waving code through because forming a real opinion cost attention we had already spent. The tax comes due either way. The only decision left is whether we pay it on purpose or let it erode our grip on our own codebase.

So we stopped treating it as a willpower problem and started treating it as an architecture problem, because that is what it is. Attention is the scarce serial resource, and we design around it now the way we would design around any bottleneck in production. We use backpressure and keep the number of live agents close to the rate we can genuinely review, which for us is three or four rather than twenty. We sort work into two piles and refuse to mix them. Isolated tasks go to background agents that only need us at the final gate. The judgement-heavy work, the strange bug or the shape of a new module, stays single file, because running several of those at once only thrashes the lock and everything comes out worse. We batch the reviews into one sitting instead of grazing on them all day. We spend the lock only on what the machine cannot verify on its own and let the agents prove the rest with tests and screenshots. Spawning agents turned out to be the easy part, and the easy part was never the job. The job is building the system around the one resource we cannot clone, which is our own attention, and giving it the same respect we give anything else we run in production.

"

The board was full and moving. What reached main was a fraction of the motion.

Field Notes № 12