The news. On June 8, 2026, researchers released SearchSwarm, a deep-research agent that learns to delegate. Instead of prompting a generic model to act like a manager, the authors build a harness that pushes a strong model toward high-quality task decomposition, force its subagents to return tidy results, and then use those runs as supervised fine-tuning data — baking delegation into the base model's weights. The resulting SearchSwarm-30B-A3B reports 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, the best among comparable-scale models. Read the paper →

Picture a veteran general contractor on a job site. They never pick up a hammer. Their entire skill is delegation: look at "build a house," instantly split it into framing, plumbing, and wiring, hand each to the right subcontractor with a tight, one-line scope, and accept only a clean report back — "framing's up, inspected" — not a truckload of sawdust dumped on the office floor. A rookie GC with the same blueprint flails: they forget to split a phase, micromanage one sub while another stalls, and end up buried in detail they should have delegated away. Same blueprint, wildly different outcome — and the difference is a learned skill, not a checklist.

That gap is exactly the one SearchSwarm closes for long-horizon web research. A single agent that tries to answer a hard BrowseComp question by reading every page itself drowns in its own context — by the hundredth search result, the first lead has scrolled out of attention's reach. The known remedy is the orchestrator–workers shape: the orchestrator dispatches subagents that each chase one slice and return a short summary, so the manager's working context stays small and coherent. The catch is that prompting a general model to play this manager is brittle — it was never trained to decompose, so it skips the split, or it lets a subagent dump its whole transcript back and re-floods the very context delegation was supposed to protect.

SearchSwarm's move is to stop hoping and start training. The authors wrap a strong model in a delegation harness that nudges it toward good decompositions and constrains every subagent to return a clean, formatted result. The high-quality runs that fall out — full trajectories of "split here, dispatch this, accept that summary" — become supervised fine-tuning data. Fine-tune the base model on them and delegation stops being a fragile prompt and becomes a reflex baked into the weights: the veteran contractor, not the rookie reading the manual. Because the model now keeps each subagent's context isolated by default, a tidy orchestrator context is the trained behavior, not a lucky one.

Where delegation livesHow you get itFailure modeExample
In the prompt / harness, at inferencescaffold a generic model with an orchestrator–workers promptbrittle — the model was never trained to split workgeneric orchestrator–workers agent
In a learned routing policyRL trains an orchestrator to route to frozen expert modelsneeds reward design; experts stay fixedMaestro
In the base model's weightsSFT on harness-generated delegation trajectoriesneeds a good teacher harness to generate the tracesSearchSwarm-30B-A3B

Why does a clean context matter enough to train for it? Walk the budget (token counts here are illustrative — the paper reports the benchmark scores, not these figures). Say a BrowseComp question needs evidence from 40 web pages, and each raw page is ~2,000 tokens. A single agent that reads them all carries 80,000 tokens of raw page text in its working context — and long before the end, the early evidence has fallen out of reach. SearchSwarm's orchestrator instead splits the hunt into, say, 5 sub-searches, hands each to a subagent, and gets back a 200-token verified summary. The orchestrator's context now holds just 1,000 tokens of clean findings — an 80× smaller working set — so it can still reason over the first lead when it reaches the last. That preserved coherence is what lets a 3B-active model stay on-track across a long horizon and post 68.1 on BrowseComp — the best among comparable-scale models.

Goes deeper in: AI Agents → Workflow Patterns → Orchestrator–Workers

Related explainers

Continue in trackAgent Engineering — Agent Teams: coordinating a supervisor and its worker agents in production

Frequently Asked Questions