What is a symbolic future in AsyncFC?

A symbolic future is a typed placeholder token — written something like ⟨fut1⟩ — that the AsyncFC harness inserts into the LLM's decode stream the moment a tool call is emitted. It stands in for the not-yet-resolved tool result. The model can keep decoding, condition on the future by name, and even issue further tool calls that reference it. When the tool resolves, the harness substitutes the real value for the placeholder before the next forward pass that actually reads it.

Does AsyncFC require retraining the LLM?

No. The paper's empirical claim is that current LLMs already handle symbolic future placeholders without any retraining — they treat the placeholder as a regular typed token and plan around it. AsyncFC is a change at the agent harness layer; the model weights and the tool implementations stay untouched, which makes it cheap to roll out behind an existing function-calling API.

How does AsyncFC relate to MCP Tasks (SEP-2663)?

They are complementary halves of the same idea. MCP Tasks lets a tool server return an async handle instead of a blocking result and exposes tasks/get, tasks/update, and tasks/cancel for the client. AsyncFC is the model-side counterpart — it tells the decoder what to do while it's holding such a handle: insert a symbolic future, keep decoding, and patch the value in when the handle resolves.

AsyncFC paper — Symbolic futures in the decode stream

Agent

learnaivisually.com/ai-explained/asyncfc-symbolic-futures

TL;DR

What is it: The AsyncFC paper introduces a harness-level pattern: when the model emits a tool call, the runtime returns a symbolic future — a typed placeholder token — immediately, dispatches the actual tool asynchronously, and patches the real value in once it resolves.
Why it’s needed: Agent latency is dominated by tool calls — searches, code execution, MCP servers — that take seconds while the model's own decode steps take milliseconds. Letting the decoder keep generating through pending tool calls turns a serial waterfall into overlapped work.
vs previous: The default synchronous function-calling loop freezes decoding on every tool call until the result returns. AsyncFC keeps the decoder moving and patches in results when they arrive — no model weight changes and no modifications to the tools themselves.

Jargon

AsyncFC: The framework introduced by this paper. It sits between the LLM and the tool runtime, returns a symbolic future placeholder in place of a blocking tool result, and substitutes the real value once the tool resolves. No model retraining, no tool changes.
Symbolic future: A typed placeholder token — written something like ⟨fut1⟩ — that stands in for a not-yet-resolved tool result. The model can reason over the future by name: condition on it, emit further tool calls that depend on it, or hand it to a later step. The runtime patches the real value in before the first forward pass that actually needs to read it.
Function calling: The general capability of an LLM to invoke external code by emitting a structured tool-call token sequence. Originally synchronous — the harness blocks the decode loop, runs the tool, feeds the result back. AsyncFC reorganizes this without changing the surface API.
Tool call: One concrete invocation of an external function — a search, a database query, a unit-test run, an MCP server method. The slow part of agent latency: typical tool calls take hundreds of milliseconds to several seconds while a single decode step takes tens of milliseconds.
Agent harness: The runtime layer between the LLM and its tools. It parses tool calls out of the token stream, dispatches them, feeds results back into context, manages state, retries, and timeouts. AsyncFC is a change at the harness layer; the model and tools don't know it's there.
Autoregressive decoding: How an LLM generates output: one token at a time, each new token conditioned on every token before it. Decoding is fundamentally sequential — the model can't skip ahead — but it can be overlapped with other work, which is what AsyncFC exploits.
MCP Tasks: Anthropic's Model Context Protocol recently shipped a parallel concept (SEP-2663): MCP servers can return a Task handle instead of a blocking result, and the client polls it via tasks/get. AsyncFC is the model-side counterpart — what the decoder does while it's waiting.

The news. On May 14, 2026, a research paper introduced AsyncFC — a futures-based async function-calling framework that overlaps LLM decoding with tool execution. The authors show that current LLMs already handle symbolic future placeholders without any retraining: the harness inserts a typed referent into the context, dispatches the tool in the background, and substitutes the real value before the next forward pass that depends on it. End-to-end task time drops while task accuracy holds. Read the paper →

Picture the coffee counter. The slow path is standing in front of the cashier waiting for your custom drink to be made before you order anything else. You stare at the espresso machine for two minutes. The line backs up. Nothing else happens. That is exactly what a synchronous tool-call loop looks like inside an LLM agent: the model emits a tool call, the harness blocks, the tool runs, the result comes back, the model resumes. While the tool is running the decoder is idle. The most expensive accelerator in your stack is doing nothing.

The fast path is walk away with a buzzer. You place your order, the cashier hands you a small puck that will vibrate when your drink is ready, and you move on — you can order something for a colleague, you can find a table, you can plan what you'll do once the drink arrives. The buzzer is the symbolic future. It's not the drink. It stands in for the drink. You can hold it, talk about it ("I'll grab the coffee in a minute"), even commit to actions that depend on it ("once the coffee's here we'll head out"). The actual drink only matters when you reach the moment that physically requires it.

AsyncFC does exactly this for the decoder. When the model emits a tool call, the harness immediately yields a typed placeholder token — written something like ⟨fut1⟩ — back into the decoding stream and dispatches the real tool call asynchronously. The model keeps decoding. It can issue more tool calls, reason over the future by name, plan what it will do once the result arrives. When the tool resolves, the harness substitutes the real value for the placeholder before the next forward pass that actually depends on reading it. No weight update, no fine-tune, no special token vocabulary. The model just treats the placeholder as a regular token it can plan around.

The catch — and this is where AsyncFC earns its name — is that the model must be able to reason over a not-yet-resolved future without crashing. The paper's empirical claim is that current LLMs already do this. The placeholder is typed (it's known to be, say, a search-results list or a numeric answer), and that type is enough for the model to keep generating plans that condition on the future without trying to materialize it. The savings show up at the agent layer: when several tool calls are independent of each other, the agent emits the whole chain of dispatches up front and decodes through them while the eligible tools run in parallel. The dependency structure of the task — which future gets needed when — sets the only true critical path.

Where the wall-clock time actually goes

The cost-and-latency profile of an agent — covered in the Agent Engineering track's Cost & Latency module — is dominated by two big buckets: token decode and tool execution. As a typical industry rule of thumb (not a paper claim), a single decode step runs in the tens of milliseconds while a tool call (a search API, a code-execution sandbox, an MCP server) runs in the hundreds of milliseconds to several seconds. A synchronous loop adds the two buckets. AsyncFC overlaps them.

Picture a toy 12-token response that emits two tool calls at positions 4 and 8 — say, a search and a calculation. Hold token decode at 500 ms each and tool execution at 2.5 s each for illustration. The sync loop walks like this: 4 tokens (2 s), tool 1 (2.5 s), 4 tokens (2 s), tool 2 (2.5 s), 4 tokens (2 s) — 11 seconds total, with the GPU idle for 5 of them. AsyncFC walks like this: 12 tokens continuously (6 s), with tool 1 running in parallel from t=2 s to t=4.5 s and tool 2 running from t=4 s to t=6.5 s — 6.5 seconds total, with the GPU never stopping. Same model, same tools, same final answer; the wall-clock difference is the time saved.

This pattern is not unique to AsyncFC. Anthropic's Model Context Protocol shipped a structurally similar idea on May 15, 2026 with SEP-2663: MCP servers can return a Task handle instead of a blocking result, and the client polls it via tasks/get. AsyncFC is the model-side counterpart — what the decoder does while it's waiting. Both pieces want the same thing: stop letting tool latency dictate agent latency.

The boundary of what AsyncFC can speed up is the true data dependency between tool calls. If every tool call's input depends on the previous tool's output, you can't overlap them — the buzzer for drink #2 can't fire until you know what drink #1 was. But in practice agentic workflows are full of parallelizable tool dispatches: search this, search that, look up the user, fetch the schema. Those are the calls AsyncFC compresses into one critical path the length of the slowest tool.

Goes deeper in: Agent Engineering → Cost & Latency → Parallelizing Tool Calls

AsyncFC paper — Symbolic futures in the decode stream

Where the wall-clock time actually goes

Frequently Asked Questions