Databricks' Zaharia and Xin on why data, not the model, is the moat
Databricks' co-founders on their one bet — that owning the data and the agent substrate, not the frontier model, is the durable moat in the AI era.
"Slap Some AGI on Top" — But Data First
Databricks' whole bet is that once your data sits in the right place, today's generic agents are already good enough to rewrite most software on top of it.
I think many of the traditional software will be sort of rewritten with this new paradigm which is just get the data to be there and then let's slap some AGI on top. Magic will come out.
Agent Cloud: One API Over Every Harness
Omnigent maps Claude Code, Codex, Cursor and raw SDKs to a single session interface, so teams can swap models and harnesses without rewriting their tooling.
Claude code, running in a terminal, Codex, you know, Phi, OpenAI SDK, all that stuff. We map them all to that same interface.
The Dark Ages of Programming
The push for persistent cloud sandboxes came from a co-founder literally checking his coding-agent sessions at red lights because closing his laptop would kill them.
I was tethering my laptop to my phone. Keeping on the side, whenever I hit a red light, I started looking at what's going on on my laptop. And I just felt that was ridiculous.
Judge the Session, Not the Call
Blunt allow/deny rules can't catch an agent that reads secrets and then exfiltrates them, so Databricks tracks per-session state and raises the bar as risk accumulates.
the thing we decided we need is stateful or what we call contextual policies, where you keep track of the state of that session.
L-TAP: Unify Storage, Not the Query Engine
Instead of the impossible single OLTP-plus-OLAP engine, Databricks shares one open storage layer and transcodes row data into Parquet on idle CPUs, so analytics reads it with no pipeline.
We think you can get 99% of what you need by unifying the storage. And just have a single storage layer.
A Factory That Builds the Database
Rather than hand-picking algorithms from papers, Databricks trained a model on a decade of query traces that predicts how any plan will perform and routes to the best one at runtime.
the factory takes the decade of traces we have. I think they count as a quadrillion data points in the trace table.
Specialize, Don't Chase the Frontier
After buying Mosaic, Databricks chose not to train frontier models and instead ships small specialized ones — a document parser roughly 100x cheaper than frontier models, and better.
the same model generates training environments and trains itself and beats like Opus and GPT 5.5 and stuff at a task.
Start Open, Start Large, Start Upstream
Databricks beat Snowflake by starting with open formats and bulk ingestion upstream, then flowing downstream into fast serving — the easier direction to travel.
start open and start large. Like in some sense, we started upstream of them.