MAI-Code-1-Flash — Adaptive solution length
LLMThe news. On June 2, 2026, at Build 2026, Microsoft introduced its first in-house frontier models — MAI-Thinking-1 (a 35B-active sparse MoE reasoner with a 256K context, which Microsoft says was trained from scratch on licensed data with no distillation from third-party models) and MAI-Code-1-Flash, a small, inference-efficient coding model built end-to-end by Microsoft and rolling out to GitHub Copilot users in VS Code. MAI-Code-1-Flash reportedly leads Claude Haiku 4.5 by 16 points on SWE-Bench Pro (51.2% vs 35.2%) while using up to 60% fewer tokens, which it credits to adaptive solution-length control. Read the announcement →
Picture the test-taker for a second. Two students sit the same exam. The first was told to spend exactly ten minutes per question — so she burns the full ten on "2 + 2," sits there second-guessing a settled answer, and runs short on the proof at the end. The second reads each question, sizes up the effort, and moves on the moment she's sure — thirty seconds on the arithmetic, the full ten on the proof. Same paper, same score, far less time. Adaptive solution-length control is the second student: the model spends its reasoning where difficulty actually demands it, instead of paying a flat tax on every task.
Under the hood, the "minutes" are reasoning tokens. A thinking model generates its chain-of-thought one token at a time before answering, and every one of those tokens is a decode step you pay for in latency and dollars. A fixed budget sets one length for all prompts; adaptive control instead decides how long to keep thinking and, crucially, when to stop. Microsoft hasn't disclosed the exact controller — whether the length is learned, predicted up front, or a learned stop signal mid-chain — so treat the mechanism as undisclosed; what's reported is the outcome: the same benchmark scores at a fraction of the tokens.
Where the tokens actually go
A back-of-envelope walk-through (illustrative numbers; the 60% figure is Microsoft's). Take three Copilot tasks: an easy one-line fix, a medium multi-step bug, and a hard cross-file refactor. A fixed budget of ~2,000 reasoning tokens spends all three the same way → ~6,000 tokens total, even though the easy fix had its answer after ~200. Adaptive control stops each chain at its answer — roughly ~200 + ~650 + ~1,650 ≈ ~2,500 tokens — for the same result. That's ~58% fewer tokens in this toy mix, right in line with the up to 60% fewer Microsoft reports. The hard task barely changes; the savings come almost entirely from not over-thinking the easy and medium ones.
Three ways to set the reasoning length
| Strategy | Easy task | Hard task | Main risk |
|---|---|---|---|
| Fixed-max budget | thinks far past the answer | fits — has room | over-thinking: burns tokens it doesn't need |
| Fixed-min budget | fits — short is fine | cut off too early | underthinking: commits to wrong answers |
| Adaptive control | short chain | long chain | needs a reliable stop signal |
The catch lives in that last cell. A fixed budget is dumb but safe; adaptive length is only as good as its sense of when it's done. Stop one token too early on a hard task and you get underthinking — a confident wrong answer that's worse than a slow right one. That's why the headline number is a coding model's: in software, a test or verifier can often tell the model whether it's actually done, giving the stop signal something concrete to lean on. The win is real and specific — fewer reasoning tokens for the same accuracy — and it rides entirely on getting that stop right.
Goes deeper in: AI Agents → Planning & Reflection → Reasoning budget
Related explainers
- Compute Where It Counts — Per-token compute controller — the other axis of adaptive compute: how much work each token gets, vs how many tokens the chain runs
- LongTraceRL — Rubric reward (process supervision) — how reasoning chains get trained, where a good stop signal would come from
- Gemini 3.5 Flash — Agent-first model design — a related angle: building a model for the agent loop rather than retrofitting chat