LLM Internals
9 interactive modules from tokenization to PagedAttention. All free.
Aa
Module 1
Tokenization
How text becomes tokens — BPE, subword splitting, byte-level encoding.
Start module →
[]
Module 2
Embeddings
Tokens to vectors — embedding lookup, positional encoding, cosine similarity.
Start module →
QK
Module 3
Self-Attention
Q, K, V — the attention mechanism that powers transformers.
Start module →
TF
Module 4
Transformer Block
Attention + FFN + LayerNorm + residual — one block at a time.
Start module →
>>
Module 5
Text Generation
Autoregressive decoding — temperature, top-k, top-p sampling.
Start module →
KV
Module 6
KV Cache
Why KV caching is essential — prefill vs decode, GQA.
Start module →
Q4
Module 7
Quantization
Shrink LLMs — FP32/FP16/INT8/INT4, GPTQ, AWQ, QLoRA, GGUF.
Start module →
||
Module 8
Batching
Static vs continuous batching, and the memory-throughput tradeoff.
Start module →
PA
Module 9
Paged Attention
How vLLM solves KV cache fragmentation — block tables, prefix sharing.
Start module →