MCP 2026-07-28 RC — stateless transport

Agent
L
MCP clienttools/call queuefleet entryLBS1MCP server0%utilizationS2MCP server0%utilizationS3MCP server0%utilizationsession-tagged requestone server pinnedtwo servers idleself-addressed requestbalanced fleetfleet utilization, same workload92%S1 pinned (sticky)49% each3 servers, stateless
learnaivisually.com/ai-explained/mcp-2026-07-28-stateless-transport

The news. On May 22, 2026, the MCP project landed PR #2750 — the blog announcement for the 2026-07-28 specification release candidate. The post leads with the stateless transport rework as the headline change, with a before/after HTTP example showing a self-contained tools/call request. Extensions, MCP Apps, and Tasks follow as the new capability story; the authorization changes are summarized by the failure modes they fix rather than enumerated SEP-by-SEP. All twenty-two scoped SEPs are linked from the announcement.

Picture the post office with many windows. The slow path is the sticky clerk: you hand your letter to clerk #3, and clerk #3 jots the details in a notebook only she keeps in her drawer. If you come back to check on your shipment, you have to wait at her window — none of the other clerks can tell you anything. If clerk #3 is busy, or goes on break, or quits, the trail of your shipment goes with her. The line at her window grows; the other windows are quiet. That is exactly what sticky-routed MCP looks like today. The agent's tool-use loop opens a session, the load balancer pins that session to one server, and every follow-up call has to land on that same server. One server gets the traffic; the others sit idle.

The fast path is the self-addressed envelope. You write the destination, the sender, and a tracking ID on the front of every letter, and the post office stops needing any one clerk to remember anything about your shipment. Any open window will do. That is the 2026-07-28 framing: each tools/call carries the protocol version it expects, the client capabilities it declared, any routing keys the server fleet needs, and the auth context — all in the request itself. The server reads the envelope and acts. No drawer notebook. No "come back to me." A second request half a second later can land on a different server entirely and produce identical behavior.

There is a real subtlety worth saying out loud. A few MCP interactions genuinely do need cross-request memory — long-lived subscriptions, sampling sessions, OAuth tokens that have to outlive a single call. The new design does not pretend those don't exist. It externalizes them: the central tracking database the metaphor mentions is a shared store (a Redis-equivalent, a database, an object store) that any server queries when it needs to hydrate that bit of cross-request state. The transport is still stateless — the request itself is self-contained — and the implementation pattern of a shared store is what makes the small slice of stateful behavior work across a fleet. Mixing those two ideas up is easy and worth keeping straight: the protocol's change is at the transport layer; the shared store is one way servers can choose to persist what little state has to outlive a request.

The capacity argument writes itself. Consider 300 concurrent agent sessions, each holding open MCP traffic at ~2 calls per second, hitting a fleet of 3 servers. Sticky routing assigns each session to one server at session open. Distribution is rarely uniform — three or four "power user" sessions can pin one server's load near saturation while the others sit at 10-20%. Numerically: a typical sticky-imbalance run might leave S1 at ~92% utilization while S2 and S3 sit at ~8% and ~41% (illustrative). Under stateless transport with the same workload, the load balancer can spray every call independently. The same 600 calls/sec land on three servers at ~49% each (illustrative) — a ~1.9× improvement in usable fleet headroom before any vertical scaling.

Where the rework earns its keep

Sticky routing's failure modes are well-known in the agent harness world: one hot server, blue/green deploys that have to drain sessions for minutes, crash recovery that can't transparently re-route. The 2026-07-28 RC closes all three at the transport level. Self-contained requests do not pin to anything, so a deploy that rolls a server out of rotation finishes in seconds — pending requests just hit the next server. A server that crashes drops its in-flight requests, and the client retries against the fleet — the next call lands somewhere else and proceeds. The only state that needs to survive the crash is whatever the workload chose to put in the shared store, which is the small minority of interactions.

The shape of what the RC actually changes is concrete. The table below contrasts the legacy and new transport.

AspectSticky-routed transport (legacy)Stateless transport (2026-07-28 RC)
Session lifetimeBound to one server for the session's lifeNo per-session server binding
Routing keySession ID hashed to a specific instanceNone — any instance, any request
First requestHandshake that creates server-local stateSelf-contained, no implicit setup
Cross-request stateIn server memoryIn a shared store, only when needed (subscriptions, sampling, auth)
Horizontal scale-outAwkward — uneven load by session hashNative — load balancer sprays calls
Server restartDrops the session; client must rebuildDrops in-flight; retry hits any other server

A related design point is worth knowing. The Tasks extension (SEP-2663) ships a complementary idea one layer up: it gives the client a long-lived taskId it can poll across reconnects. SEP-2663 needed the transport rework to be fully useful — a taskId polled across reconnects only works if the next tasks/get doesn't have to land on the same server that issued the handle. Stateless transport is what makes that work: the taskId is the only cross-request key the client carries, the server fleet hydrates the task's state from the shared store, and the polling call goes to whichever server is least busy.

The boundary of what the RC changes is the transport itself, not the protocol semantics. Tools still return tool results; resources still return resource contents; the wire format of a method call is the same JSON-RPC envelope. What changes is what a server is allowed to assume: nothing about prior calls on the same connection. That single discipline is enough to make every harness operator's life easier and to make the parallel-tool-call patterns the Cost & Latency module recommends actually achievable in a fleet.

Goes deeper in: Agent Engineering → Harness Architecture → Failure Modes

Frequently Asked Questions