Understand LLMs by seeing them work
Interactive visual simulations that make transformers, attention, and KV cache click. No GPU required. Free.
Live Preview: Self-Attention
Thecatsatonthemat
The
cat
sat
on
the
mat
The
0.35
0.07
0.05
0.10
0.35
0.08
cat
0.06
0.26
0.47
0.06
0.07
0.08
sat
0.05
0.52
0.21
0.06
0.05
0.11
on
0.07
0.05
0.08
0.20
0.08
0.52
the
0.33
0.07
0.05
0.10
0.38
0.07
mat
0.04
0.07
0.10
0.42
0.05
0.32
Hover over tokens to explore attention patterns
LLM Internals — 9 Interactive Modules
Aa
Module 1
Tokenization
How raw text is split into tokens before entering a model.
[]
Module 2
Embeddings
Mapping tokens to high-dimensional vectors that encode meaning.
QK
Module 3
Self-Attention
Queries, keys, and values — how tokens attend to each other.
TF
Module 4
Transformer Block
Attention + feed-forward layers stacked into a full block.
>>
Module 5
Text Generation
Autoregressive decoding: sampling the next token step by step.
KV
Module 6
KV Cache
Caching past key-value pairs to speed up inference.
Q4
Module 7
Quantization
Reducing weight precision to shrink model size and memory.
||
Module 8
Batching
Processing multiple sequences in parallel for throughput.
PA
Module 9
Paged Attention
Memory-efficient KV cache management for long sequences.
100%Browser-based
9Modules
FreeAll foundational