MAI-Code-1-Flash — Adaptive solution length

LLM
L
Adaptive solution length — think as long as the task needs, not a fixed budgetreasoning tokens spent: 0Easyone-line fix0 tokMediummulti-step bug0 tokHardcross-file refactor0 tokanswer reachedwasted tokens — kept thinking past the answerStop at the answer — skip the wasted tokens~6,000~2,500 reasoning tokensMAI-Code-1-Flash: up to 60% fewer tokens · +16 pts SWE-Bench Pro vs Haiku 4.5token counts illustrative · 60% / +16 pts reported by Microsoft
learnaivisually.com/ai-explained/mai-code-1-flash-adaptive-solution-length

The news. On June 2, 2026, at Build 2026, Microsoft introduced its first in-house frontier models — MAI-Thinking-1 (a 35B-active sparse MoE reasoner with a 256K context, which Microsoft says was trained from scratch on licensed data with no distillation from third-party models) and MAI-Code-1-Flash, a small, inference-efficient coding model built end-to-end by Microsoft and rolling out to GitHub Copilot users in VS Code. MAI-Code-1-Flash reportedly leads Claude Haiku 4.5 by 16 points on SWE-Bench Pro (51.2% vs 35.2%) while using up to 60% fewer tokens, which it credits to adaptive solution-length control. Read the announcement →

Picture the test-taker for a second. Two students sit the same exam. The first was told to spend exactly ten minutes per question — so she burns the full ten on "2 + 2," sits there second-guessing a settled answer, and runs short on the proof at the end. The second reads each question, sizes up the effort, and moves on the moment she's sure — thirty seconds on the arithmetic, the full ten on the proof. Same paper, same score, far less time. Adaptive solution-length control is the second student: the model spends its reasoning where difficulty actually demands it, instead of paying a flat tax on every task.

Under the hood, the "minutes" are reasoning tokens. A thinking model generates its chain-of-thought one token at a time before answering, and every one of those tokens is a decode step you pay for in latency and dollars. A fixed budget sets one length for all prompts; adaptive control instead decides how long to keep thinking and, crucially, when to stop. Microsoft hasn't disclosed the exact controller — whether the length is learned, predicted up front, or a learned stop signal mid-chain — so treat the mechanism as undisclosed; what's reported is the outcome: the same benchmark scores at a fraction of the tokens.

Where the tokens actually go

A back-of-envelope walk-through (illustrative numbers; the 60% figure is Microsoft's). Take three Copilot tasks: an easy one-line fix, a medium multi-step bug, and a hard cross-file refactor. A fixed budget of ~2,000 reasoning tokens spends all three the same way → ~6,000 tokens total, even though the easy fix had its answer after ~200. Adaptive control stops each chain at its answer — roughly ~200 + ~650 + ~1,650 ≈ ~2,500 tokens — for the same result. That's ~58% fewer tokens in this toy mix, right in line with the up to 60% fewer Microsoft reports. The hard task barely changes; the savings come almost entirely from not over-thinking the easy and medium ones.

Three ways to set the reasoning length

StrategyEasy taskHard taskMain risk
Fixed-max budgetthinks far past the answerfits — has roomover-thinking: burns tokens it doesn't need
Fixed-min budgetfits — short is finecut off too earlyunderthinking: commits to wrong answers
Adaptive controlshort chainlong chainneeds a reliable stop signal

The catch lives in that last cell. A fixed budget is dumb but safe; adaptive length is only as good as its sense of when it's done. Stop one token too early on a hard task and you get underthinking — a confident wrong answer that's worse than a slow right one. That's why the headline number is a coding model's: in software, a test or verifier can often tell the model whether it's actually done, giving the stop signal something concrete to lean on. The win is real and specific — fewer reasoning tokens for the same accuracy — and it rides entirely on getting that stop right.

Goes deeper in: AI Agents → Planning & Reflection → Reasoning budget

Related explainers

Continue in trackAI Agents — Planning & Reflection: the reasoning budget

Frequently Asked Questions