The news. On June 30, 2026, Google Research introduced TabFM, a foundation model for tabular classification and regression that works on previously unseen tables in a single forward pass. It ships on Hugging Face and GitHub, and Google says it will land in BigQuery behind an AI.PREDICT SQL command. It was measured on TabArena across dataset sizes from 700 to 150,000 samples. Read the release →

Picture a fill-in doctor who starts a shift by reading the ward's whole patient log. Each past row is a solved case — symptoms in one set of columns, the confirmed diagnosis in another. At the bottom sits a new patient with the diagnosis column blank. The doctor does not go back to medical school for this ward; they predict the blank by reading the log they were just handed. That is the whole idea behind TabFM: as Google puts it, the model "takes the entire dataset — comprising both the historical training examples and the target testing rows — as a single unified prompt." This is in-context learning — the same trick where a language model answers better after you paste a few examples into the prompt — applied to a spreadsheet.

That reframing matters because it flips the usual tabular workflow. Normally a new table means a new model: you fit an XGBoost model or train a small neural net on that specific spreadsheet, then throw it away for the next one. TabFM does no per-table training at all — one pretrained model reads the table and predicts, so the "training run" disappears into a single forward pass.

AspectTrain a model per table (XGBoost, bespoke net)TabFM — in-context [Google]
A new tablefit a fresh model — its own training runone forward pass, no training
Learns fromgradient updates on that one tablethe table itself, read as a prompt
Reuseone model serves one tableone model, previously unseen tables

But a table is not a sentence, and that raises a real question: in what order do you read it? A log has no natural first or last patient, and two directions carry meaning at once — rows are examples, columns are features. So TabFM attends both ways in alternating passes: row attention compares the new case against similar past ones (which patients look like this one?), and column attention weighs which fields move together (which vitals track the diagnosis?). Reading down the columns is like how self-attention lets a token look across a whole sequence — only here it runs across features and across examples, not left-to-right words.

The catch is compute. If you flatten every cell into one long prompt, attention has to compare every cell with every other, and that bill explodes. TabFM's fix is row compression: it first boils each row down to a single dense vector, then runs the main Transformer over that short sequence of row-summaries instead of the full grid.

Watch the numbers on a modest table: 1,000 example rows and 100 columns (illustrative). Flatten it and you get 1,000 × 100 = 100,000 cells in one prompt; attention scales with the square, so that is roughly 100,000² = 10 billion pairwise comparisons. Compress each row to one vector first and the expensive sequence the Transformer reasons over is just the 1,000 rows — about 1,000² = 1 million comparisons, a ~10,000× cut at that stage (illustrative). Same table, a fraction of the compute — which is what makes "read the whole dataset as a prompt" affordable in the first place.

One more piece explains how a model can be good at a table it has never seen: it never trained on your data, but it trained on the shape of data. TabFM pretrains on hundreds of millions of synthetic tables generated by structural causal models — little cause-and-effect recipes that spin up realistic feature relationships. Meeting millions of these, the model learns the general grammar of "features predict a target," which is why a fresh spreadsheet is just one more example of a pattern it already understands. On the TabArena benchmark — 38 classification and 13 regression datasets, sizes from 700 to 150,000 samples, with a 32-way ensemble for its strongest configuration — that pretrained grammar is what lets a single forward pass compete without a single gradient step on the target table.

Goes deeper in: LLM Internals → Self-Attention → Computing Attention Scores

Related explainers

Frequently Asked Questions

Check what you knowMap your AI & GPU knowledge across every track — free, role-based