What is Monte Carlo Graph Search?

It is a search method that extends Monte Carlo Tree Search — which grows a tree of candidate solutions and scores branches with cheap rollouts — by turning the tree into a graph. Reference edges let a useful sub-result found on one branch be reused by other branches, so the search shares discoveries instead of re-deriving them. MLEvolve's Progressive version also schedules the search to start broad (explore) and then narrow onto the best branch (exploit).

Why does MLEvolve beat AlphaEvolve?

Because its search wastes less compute. By sharing sub-results across branches with graph edges and scheduling exploration before exploitation, MLEvolve aims its rollouts at unexplored algorithms rather than re-discovering the same tricks. The paper reports state-of-the-art results on MLE-Bench in a 12-hour budget — about half the standard runtime — and outperforms AlphaEvolve on mathematical algorithm optimization, suggesting the search generalizes across domains.

How does it relate to Monte Carlo Tree Search?

MCGS is MCTS with a memory of what other branches found. MCTS explores each branch independently with a single fixed explore/exploit constant; MCGS adds graph reference edges for cross-branch sharing and an entropy-style schedule that shifts the search from exploration to exploitation over time. Everything else — growing candidates and scoring them with rollouts — is the same.

MLEvolve: self-evolving agents beat AlphaEvolve — Progressive Monte Carlo Graph Search

Jargon

Monte Carlo Tree Search (MCTS): A classic search method: grow a tree of candidate moves, and estimate how good each branch is by running cheap random rollouts to the end. Famous from game-playing agents; each branch is scored on its own.
Monte Carlo Graph Search (MCGS): MLEvolve's twist on MCTS. It adds reference edges so the search is a graph, not a tree — a useful result found on one branch can be reused by other branches instead of being re-discovered.
Rollout: A single trial run that takes a candidate as far as it will go and scores the outcome. Many rollouts give a noisy estimate of how promising a branch is — the "Monte Carlo" part.
Exploration vs exploitation: The core search tradeoff: explore new, uncertain branches, or exploit the best one found so far. Lean too far either way and you waste your search budget.
MLE-Bench: A benchmark of real machine-learning engineering tasks (Kaggle-style competitions). Agents are scored on medal rate and how often they submit a valid solution; MLEvolve reports state-of-the-art under a 12-hour budget.
AlphaEvolve: A prior evolutionary code-search system for discovering algorithms. MLEvolve reports beating it on mathematical algorithm optimization, which the paper presents as cross-domain generalization.
Retrospective Memory: MLEvolve's other half (out of scope here): a knowledge base plus a dynamic global memory that lets the agent retrieve and reuse experience across tasks, on top of the search described below.

The news. On June 4, 2026, the MLEvolve paper (arXiv 2606.06473, from Shanghai AI Laboratory and East China Normal University) introduced an LLM-based, self-evolving multi-agent framework for end-to-end machine-learning algorithm discovery. On MLE-Bench it reaches state-of-the-art average medal rate under a 12-hour budget — half the standard runtime — and outperforms AlphaEvolve on mathematical algorithm optimization. Read the paper →

Picture a mountain you are trying to climb, where every distinct route to the top is a different candidate algorithm. Nobody can try every route — there are far too many — so you send out scouts: each one heads up a promising line and reports back how far it got. That report is a rollout, and growing a tree of routes scored by rollouts is exactly Monte Carlo Tree Search, the search-and-score loop behind a lot of game-playing agents.

The trouble with a plain tree is that every scout explores alone. A clever shortcut that one scout discovers high up on the left face never makes it onto anyone else's map — so a scout on the right face burns its own steps re-finding the very same thing, and three other routes independently dead-end at the same cliff. MLEvolve's Progressive Monte Carlo Graph Search fixes this with graph reference edges: when a branch finds a useful sub-result, it gets pinned to one shared map where every other branch can read it. The search is now a graph, not a tree — discoveries flow sideways between routes instead of being re-derived on each.

The second half of "progressive" is when to spend your scouts. Early on, MCGS deliberately scouts widely — an entropy-style schedule that keeps the search exploring many faces of the mountain. As evidence accumulates, the schedule tightens onto the most promising route and pours the remaining budget into climbing it. That shift from exploration to exploitation is the same budget-allocation problem every planning agent faces — MCGS just schedules it explicitly instead of using one fixed knob. MLEvolve also keeps the planner that picks routes separate from the workers that write the code, an orchestrator-and-workers split that lets each part specialize.

Put numbers on the shortcut-sharing to see why it pays (all numbers here are illustrative). Say a run has a search budget of 1,000 rollouts spread across 50 sibling branches, so each branch gets about 20. In a plain tree, suppose 10 of those branches each independently re-derive the same normalization trick before they can make progress — that is roughly 200 rollouts, a fifth of the whole budget, spent re-discovering one fact. With a graph edge, the first branch to find the trick pins it for the other 49 to read for free, so those ~200 wasted rollouts become ~200 rollouts of fresh exploration on routes nobody has tried — the same budget, aimed at new ground.

Approach	How it searches	Cross-branch sharing	Explore vs exploit
Monte Carlo Tree Search	a tree of candidates, scored by rollouts	none — each branch is on its own	one fixed balance (e.g. a constant)
AlphaEvolve (evolutionary)	mutate & select a population of programs	only via the surviving population	set by mutation/selection pressure
Progressive MCGS (MLEvolve, arXiv)	a graph of candidates, scored by rollouts	reference edges share sub-results	a schedule: explore wide → exploit best

None of this is free: a graph needs the bookkeeping to decide which sub-results are worth pinning and which branches should read them, and a schedule needs tuning so it neither commits too early nor wanders too long. But once that machinery works, the search stops paying the same toll over and over — which is how MLEvolve gets to state-of-the-art on MLE-Bench in roughly half the usual runtime and edges past AlphaEvolve on a different domain entirely.

Goes deeper in: AI Agents → Planning & Reflection → Reasoning budget

Related explainers

Maestro — RL orchestrator over frozen experts — another take on a planner that routes work to specialized workers
EFC — feedback quality predicts agent success — why what a search loop learns from matters more than raw compute
RecMem — subconscious agent memory — the experience-reuse angle, MLEvolve's other half (Retrospective Memory)

Continue in trackAI Agents — Planning & Reflection: spending a reasoning budget on explore vs exploit

Frequently Asked Questions

Check what you knowMap your AI & GPU knowledge across every track — free, role-based