StreamMA — Streaming inter-agent reasoning
AgentThe news. On June 3, 2026, researchers released StreamMA, a multi-agent system in which each agent streams its reasoning steps to downstream agents as they're generated rather than passing a finished answer. The key claim is that early reasoning steps are more reliable than late ones, so consuming the partial chain both keeps downstream agents off the error-prone final step and pipelines the work for lower latency. Across eight reasoning benchmarks with two frontier LLMs and three topologies (chain, tree, graph), StreamMA reports +7.3 percentage points of average accuracy, up to +22.4 points on HMMT 2026. Read the paper →
Picture the newsroom for a second. A reporter is covering a breaking story and files dispatches as the facts come in — the first few are solid and well-sourced, and a last-minute one is a shaky rumour. A cautious editor waits for the reporter's finished draft before laying out the next section, which means the layout stalls until the very end and hangs on whatever that last shaky line said. A streaming editor instead starts building from the early, reliable dispatches the moment they arrive. Streaming inter-agent reasoning is the second editor: agent B reads agent A's reasoning as it's produced, so it leans on the trustworthy early steps and never waits for the whole chain.
Under the hood, the "dispatches" are reasoning steps — the chain-of-thought a model generates one token at a time before answering. The usual multi-agent pattern is a serial handoff: A hands a single finished answer to the next agent, so B sits idle and then conditions on A's final step — the most error-prone one. Streaming lets B overlap its work with A instead of running in series and pulls from A's early steps, which the paper finds are more reliable. The same trick generalizes across how you wire the agents into a chain, tree, or graph, and the authors note a step-level scaling law: more reasoning steps per agent improve accuracy and efficiency together.
Where the wall-clock time actually goes
A back-of-envelope walk-through (illustrative wall-clock units; the accuracy figures are the paper's). Say agent A's reasoning chain is 6 steps (≈6 units of wall-clock) and agent B's own work takes 4 units. Under a serial handoff, B can't start until A's chain is fully done at unit 6, then runs 4 more — finishing at ~10 units, having conditioned on A's risky 6th step. Stream the same chain and B starts after just A's first 2 reliable steps (unit 2), runs its 4 units alongside A, and finishes at ~6 units — the same amount of B-work, ~40% less wall-clock, and keyed off the dependable early steps rather than the shaky final one. That second effect is the mechanism the paper argues drives the +7.3pp average accuracy gain: B is no longer misled by the part of A's chain most likely to be wrong.
| Aspect | Serial handoff | Streaming |
|---|---|---|
| When B starts | after A's full chain | as A's early steps arrive |
| What B conditions on | A's final step (error-prone) | A's early steps (more reliable) |
| Wall-clock | longer — B idles, then runs | shorter — pipelined, ~10→~6 units (illustrative) |
| Accuracy (8 benchmarks) | baseline | +7.3 pp avg; up to +22.4 pp on HMMT 2026 (StreamMA paper) |
The catch is that streaming only pays off if the early steps really are the reliable ones. When that holds — as the paper finds across its benchmarks — partial consumption beats waiting on two axes at once: lower latency from pipelining and higher accuracy from skipping the shaky final step. It's a reminder that in a multi-agent team the handoff itself is a design choice, not a given — what you pass between agents, and when, can matter as much as how capable each agent is.
Goes deeper in: Agent Engineering → Agent Teams → Handoffs
Related explainers
- Maestro — RL orchestrator over frozen experts — a different multi-agent wiring: a learned policy that routes each task to a specialist, vs streaming reasoning between peers
- Crafter — Multi-agent refinement harness with a directive critic — another team pattern where what one agent passes to the next (typed edits, not a scalar score) drives the gain
- Claude Opus 4.8 — Parallel-subagent dynamic workflows — the parallelism angle: running subagents at once to cut wall-clock, complementary to pipelining one chain