Timothée Lacroix on why chucking weights over the wall isn't enough
Mistral's co-founder and CTO explains why the company wrapped a full stack — service, an inference platform, and its own data centers — around open-weight models, betting that enterprises want control and customization more than being first to the frontier.
You Can't Just Chuck Weights Over the Wall
Mistral learned fast that releasing open weights alone doesn't deliver enterprise value, so in 2.5 years it descended the whole stack — from models down to its own data centers.
And so quickly we realized that just chucking weights over the wall wouldn't achieve that.
Everyone Is Compressing the Same Web Into the Same Weights
Lacroix frames closed pretraining as waste — every lab spends compute compressing the same public web data into nearly identical models the world can't reuse, while open weights let one artifact seed a whole community.
I think there is a lot of wasted resources because everyone is taking the same raw data, which is the data that's available on the web, and doing their best to compress it into a fixed amount of weights.
Not Every Step Needs the Big Brain
In an agentic workflow most steps repeat and must run fast and cheap, so Mistral shrinks the model to the decision it actually has to make, cutting size and energy without losing the task.
It is that when you think about agentic system or automating workflows, not all of the intelligence in all of the steps has to be this big, very powerful thing.
Six Months Behind Is a Feature, Not a Bug
Open frontier models may trail the closed leaders by roughly six months, but air-gapped enterprises happily take that delay in exchange for full control, customization, and ownership of the runtime.
And so I truly believe that we can provide models that are frontier in their capabilities, and maybe we'll be six months late. But a lot of the customers that are running with us are fine with a six-month delay.
Land One Hard Use Case, Then Let It Compound
Mistral targets a single iconic, genuinely hard use case per enterprise, because the connectors, sandboxes, and access controls it builds to solve that one become reusable plumbing that makes every later use case easier.
We when we engage with an enterprise, we often try to target an iconic use case, something that's really hard and that really provides value.
Two Kinds of Expertise, One Open Model
In the Nemotron Coalition, Mistral brings pretraining and multimodality know-how while NVIDIA brings large-scale data-center experience, and the shared output is a new open-source frontier model anyone can build on.
The benefit for everyone involved, really, is that we will have a new open source frontier model that everyone can build off on.
The Silicon Delivers — and Mistral Says Where It Doesn't
Co-designing with NVIDIA's newest chips gives Mistral real, measured gains — the GB200 alone brought at least a 2.5x training speedup on sparse mixture-of-experts models — and Lacroix is just as specific about where NVFP4 inference still breaks down.
Yeah, I mean definitely the GB200 which we've been using since June of 2025, I believe, we quickly saw a 2.5x improvement, at least, like, out of the box when training large, sparse mixture-of-experts models especially.
We Guard What Agents Read, Not Where They Write
The problem Lacroix says keeps him up is agent permissions — the industry carefully controls what an agent can read but rarely governs where it writes its results, even though the output can leak everything that went into it.
And typically one of the challenges is that we often think about what an agent is going to be able to read. We more rarely address where it's going to write the results.