StreamMA — Streaming inter-agent reasoning

Agent
L
Streaming reasoning between agents — B starts on A's reliable early stepsearly · reliablelate · risky① Serialwait for the full chain② Streamingconsume early stepsABAB+7.3 pp accuracy →0246810wall-clock units →Pipeline the agents — stream, don't wait for the finished answerwall-clock ~10~6 unitsaccuracy +7.3 pp avg8 reasoning benchmarks · up to +22.4 pp on HMMT 2026 · chain / tree / graph topologieswall-clock units illustrative · accuracy gains reported by the StreamMA paper
learnaivisually.com/ai-explained/streamma-streaming-inter-agent-reasoning

The news. On June 3, 2026, researchers released StreamMA, a multi-agent system in which each agent streams its reasoning steps to downstream agents as they're generated rather than passing a finished answer. The key claim is that early reasoning steps are more reliable than late ones, so consuming the partial chain both keeps downstream agents off the error-prone final step and pipelines the work for lower latency. Across eight reasoning benchmarks with two frontier LLMs and three topologies (chain, tree, graph), StreamMA reports +7.3 percentage points of average accuracy, up to +22.4 points on HMMT 2026. Read the paper →

Picture the newsroom for a second. A reporter is covering a breaking story and files dispatches as the facts come in — the first few are solid and well-sourced, and a last-minute one is a shaky rumour. A cautious editor waits for the reporter's finished draft before laying out the next section, which means the layout stalls until the very end and hangs on whatever that last shaky line said. A streaming editor instead starts building from the early, reliable dispatches the moment they arrive. Streaming inter-agent reasoning is the second editor: agent B reads agent A's reasoning as it's produced, so it leans on the trustworthy early steps and never waits for the whole chain.

Under the hood, the "dispatches" are reasoning steps — the chain-of-thought a model generates one token at a time before answering. The usual multi-agent pattern is a serial handoff: A hands a single finished answer to the next agent, so B sits idle and then conditions on A's final step — the most error-prone one. Streaming lets B overlap its work with A instead of running in series and pulls from A's early steps, which the paper finds are more reliable. The same trick generalizes across how you wire the agents into a chain, tree, or graph, and the authors note a step-level scaling law: more reasoning steps per agent improve accuracy and efficiency together.

Where the wall-clock time actually goes

A back-of-envelope walk-through (illustrative wall-clock units; the accuracy figures are the paper's). Say agent A's reasoning chain is 6 steps (≈6 units of wall-clock) and agent B's own work takes 4 units. Under a serial handoff, B can't start until A's chain is fully done at unit 6, then runs 4 more — finishing at ~10 units, having conditioned on A's risky 6th step. Stream the same chain and B starts after just A's first 2 reliable steps (unit 2), runs its 4 units alongside A, and finishes at ~6 units — the same amount of B-work, ~40% less wall-clock, and keyed off the dependable early steps rather than the shaky final one. That second effect is the mechanism the paper argues drives the +7.3pp average accuracy gain: B is no longer misled by the part of A's chain most likely to be wrong.

AspectSerial handoffStreaming
When B startsafter A's full chainas A's early steps arrive
What B conditions onA's final step (error-prone)A's early steps (more reliable)
Wall-clocklonger — B idles, then runsshorter — pipelined, ~10→~6 units (illustrative)
Accuracy (8 benchmarks)baseline+7.3 pp avg; up to +22.4 pp on HMMT 2026 (StreamMA paper)

The catch is that streaming only pays off if the early steps really are the reliable ones. When that holds — as the paper finds across its benchmarks — partial consumption beats waiting on two axes at once: lower latency from pipelining and higher accuracy from skipping the shaky final step. It's a reminder that in a multi-agent team the handoff itself is a design choice, not a given — what you pass between agents, and when, can matter as much as how capable each agent is.

Goes deeper in: Agent Engineering → Agent Teams → Handoffs

Related explainers

Continue in trackAgent Engineering — Agent Teams: handoffs & coordination

Frequently Asked Questions