Grant Sanderson on Why Math's Hardest Work Resists the Benchmark
AI is racing through math because math is verifiable and grindable, but asking the right question, coining the right definition, and drawing the improbable connection resist every benchmark, so the mathematician's job shifts toward curation.
The Spiky, Fractal Frontier
AI capability isn't a rising tide, it's a jagged frontier, and math sits on one of the tallest spikes, which zooms in to reveal still more spikes.
there's a spiky frontier to AI, and math is just right there in one of the spikes. But there's a fractal nature to that spikiness
Lightning Bolts, Mountains, and Raw Hustle
A hard proof can arrive in three shapes, a lightning bolt between two known fields, a whole new mountain of theory, or a brute-force slog, and only the first is easy for humans to digest.
If the character of it is mountain building, you have to put in a lot more time to understand that new mountain that was built, because it's a new thread, not just a lightning bolt between them.
Theorems Are Cheap; Definitions Are Priceless
Proving theorems is the entry tier of math, coining conjectures is rarer, and inventing the right definitions is the summit, which is exactly the part you can't turn into a benchmark.
good mathematicians prove theorems, great mathematicians come up with conjectures, and the greatest mathematicians come up with definitions.
Galois and the Hundred-Year Verification Loop
Group theory took roughly a century to be recognized as valuable, and the reward signal of the day, the academy, literally rejected the teenager who invented it.
So again, thinking about verifiable reward, the verifier function that is the academy at that time is rejecting what he wrote.
Verifiable Isn't Enough, It Has to Be Grindable
Math and code race ahead not just because answers are checkable, but because you can spin up thousands of deterministic parallel attempts, while computer use is checkable yet ungrindable, so it crawls.
It's not just verifiability; it has to be grindable.
Autoregression Is a Strange Way to Think
A model with superhuman breadth still misses the connection between two fields it has mastered, because the very connection worth making is, by construction, an unlikely next token.
But the connection where all the substance is going to come from is, by its nature, a very unlikely one.
Don't Let All Your LLMs Be Einstein
Since autoregression collapses toward one path, the leverage is to inject diversity above the model, fanning out agents with deliberately opposed goals and biases, one proving, one disproving.
You want to make sure you don't accidentally have all your LLMs be Einstein, because you might halt progress on quantum mechanics.
From Theorem-Prover to Museum Curator
Sanderson once thought AI would prove theorems and leave humans to explain them, now he expects AI to explain better too, leaving humans the role of curator deciding what's worth understanding at all.
One interesting take that I've heard about what mathematicians will end up being is that it's actually more analogous to art museum curators than anything else.