Tracks

5 tracks that take you from GPU hardware to running an agent fleet in production. Start anywhere — each track stands on its own.

Not sure where to start? See where you stand on the AI Knowledge Map →

Track 1
GPU & CUDA
9 modules

How GPUs execute parallel workloads — from threads and warps to tensor cores and FlashAttention.

  • Why GPUs & CUDA software stack
  • Execution model (threads, warps, SMs)
  • Memory hierarchy (registers → HBM)
  • Roofline model
  • Memory access patterns & coalescing
  • Tiling & matrix multiply
  • Tensor cores & mixed precision
  • Operator fusion & FlashAttention
  • Triton & torch.compile
Start track →
Track 2
LLM Internals
9 modules

From tokenization to PagedAttention — how large language models process text and generate output.

  • Tokenization (BPE)
  • Embeddings & positional encoding
  • Self-attention (Q, K, V)
  • Transformer block
  • Text generation & sampling
  • KV cache
  • Quantization (GPTQ, AWQ, GGUF)
  • Batching (static vs continuous)
  • PagedAttention
Start track →
Track 3
LLM Serving
7 modules

How vLLM, SGLang, and TensorRT-LLM actually serve LLMs — scheduler, memory, and serving-engine internals.

  • Inference engine internals (vLLM)
  • Speculative decoding
  • Prefill/decode disaggregation
  • Serving metrics & SLOs (TTFT, TPOT, P99)
  • CUDA Graphs
  • Multi-LoRA serving
  • Prefix caching & RadixAttention
Start track →
Track 4
AI Agents
9 modules

Foundations of AI agents — the loop, tools, workflows, retrieval, context engineering, planning, evals, and security — through visual interactive simulations.

  • The agent loop & state
  • Tool use & function calling (incl. MCP, Skills)
  • Workflow patterns + subagent topology
  • Retrieval & RAG
  • Context engineering
  • Planning & reflection
  • Evals & diagnostics
  • Security & the lethal trifecta
  • Capstone: three designs, same task
Start track →
Track 5
Agent Engineering
9 modules

Production agent engineering — durable harnesses, observability, layered guardrails, deployment, incident response, and running an agent fleet against an SLO.

  • Production harness architecture
  • Observability for agents
  • Layered guardrails
  • Cost & latency engineering
  • Production evals & shadow mode
  • Deployment & rollout
  • Incident handling
  • Agent teams (multi-agent orchestration)
  • Capstone: reliability operations
Start track →