The news. On June 4, 2026, the MLEvolve paper (arXiv 2606.06473, from Shanghai AI Laboratory and East China Normal University) introduced an LLM-based, self-evolving multi-agent framework for end-to-end machine-learning algorithm discovery. On MLE-Bench it reaches state-of-the-art average medal rate under a 12-hour budget — half the standard runtime — and outperforms AlphaEvolve on mathematical algorithm optimization. Read the paper →
Picture a mountain you are trying to climb, where every distinct route to the top is a different candidate algorithm. Nobody can try every route — there are far too many — so you send out scouts: each one heads up a promising line and reports back how far it got. That report is a rollout, and growing a tree of routes scored by rollouts is exactly Monte Carlo Tree Search, the search-and-score loop behind a lot of game-playing agents.
The trouble with a plain tree is that every scout explores alone. A clever shortcut that one scout discovers high up on the left face never makes it onto anyone else's map — so a scout on the right face burns its own steps re-finding the very same thing, and three other routes independently dead-end at the same cliff. MLEvolve's Progressive Monte Carlo Graph Search fixes this with graph reference edges: when a branch finds a useful sub-result, it gets pinned to one shared map where every other branch can read it. The search is now a graph, not a tree — discoveries flow sideways between routes instead of being re-derived on each.
The second half of "progressive" is when to spend your scouts. Early on, MCGS deliberately scouts widely — an entropy-style schedule that keeps the search exploring many faces of the mountain. As evidence accumulates, the schedule tightens onto the most promising route and pours the remaining budget into climbing it. That shift from exploration to exploitation is the same budget-allocation problem every planning agent faces — MCGS just schedules it explicitly instead of using one fixed knob. MLEvolve also keeps the planner that picks routes separate from the workers that write the code, an orchestrator-and-workers split that lets each part specialize.
Put numbers on the shortcut-sharing to see why it pays (all numbers here are illustrative). Say a run has a search budget of 1,000 rollouts spread across 50 sibling branches, so each branch gets about 20. In a plain tree, suppose 10 of those branches each independently re-derive the same normalization trick before they can make progress — that is roughly 200 rollouts, a fifth of the whole budget, spent re-discovering one fact. With a graph edge, the first branch to find the trick pins it for the other 49 to read for free, so those ~200 wasted rollouts become ~200 rollouts of fresh exploration on routes nobody has tried — the same budget, aimed at new ground.
| Approach | How it searches | Cross-branch sharing | Explore vs exploit |
|---|---|---|---|
| Monte Carlo Tree Search | a tree of candidates, scored by rollouts | none — each branch is on its own | one fixed balance (e.g. a constant) |
| AlphaEvolve (evolutionary) | mutate & select a population of programs | only via the surviving population | set by mutation/selection pressure |
| Progressive MCGS (MLEvolve, arXiv) | a graph of candidates, scored by rollouts | reference edges share sub-results | a schedule: explore wide → exploit best |
None of this is free: a graph needs the bookkeeping to decide which sub-results are worth pinning and which branches should read them, and a schedule needs tuning so it neither commits too early nor wanders too long. But once that machinery works, the search stops paying the same toll over and over — which is how MLEvolve gets to state-of-the-art on MLE-Bench in roughly half the usual runtime and edges past AlphaEvolve on a different domain entirely.
Goes deeper in: AI Agents → Planning & Reflection → Reasoning budget
Related explainers
- Maestro — RL orchestrator over frozen experts — another take on a planner that routes work to specialized workers
- EFC — feedback quality predicts agent success — why what a search loop learns from matters more than raw compute
- RecMem — subconscious agent memory — the experience-reuse angle, MLEvolve's other half (Retrospective Memory)