AI Explained
Plain explanations of trending AI concepts, with live visualizations.
OpenAI and Broadcom's Jalapeño, a custom inference ASIC — Inference ASIC vs GPU — What does it mean?
OpenAI and Broadcom's Jalapeño runs LLM inference only, trading a GPU's flexibility for a shorter, faster path from memory to compute.
UFP4 fixes FP4 pretraining's shrinkage bias — E2M1 shrinkage bias — What does it mean?
E2M1's lopsided 4-bit bins round values toward zero — a shrinkage bias UFP4 fixes with a Hadamard transform + stochastic rounding.
NVIDIA Blackwell sweeps MLPerf Training 6.0 — Strong scaling — What does it mean?
Strong scaling asks if 2× the GPUs really halves training time. MLPerf 6.0: 8,192 Blackwell GPUs trained DeepSeek-V3 671B to target in 2.02 min.
INT8 finally beats FP8 on consumer GPUs — Fused INT8 GEMM kernel — What does it mean?
A fused Triton kernel keeps INT8 matmuls on the tensor cores end to end, so W8A8 finally beats FP8 on a consumer GPU — no dequant round trip.
NVIDIA RTX Spark superchip — Unified CPU–GPU memory — What does it mean?
RTX Spark wires a Grace CPU and a Blackwell GPU to one 128GB pool over NVLink-C2C, so the GPU skips the PCIe host–device copy.
NVIDIA AI Factories — Tokens-per-megawatt as a serving metric — What does it mean?
NVIDIA's 'AI Factories' framing reorganizes datacenter economics around tokens per megawatt — bundling compute, memory, interconnect, and orchestration into one billable knob and claiming ~50× tokens/MW for Blackwell Ultra GB300 NVL72 vs Hopper.
NVIDIA Jetson Thor — Edge Blackwell vs datacenter Blackwell — What does it mean?
Jetson Thor is NVIDIA's Blackwell-architecture edge AI module — 2,070 FP4 TFLOPS in a 40–130W envelope, reportedly 7.5× compute and 3.5× per-watt vs Jetson Orin (Computex 2026).
I/O-optimal approximate attention — Near-linear I/O vs FlashAttention — What does it mean?
A new paper derives approximate-attention algorithms whose I/O between SRAM and HBM scales near-linearly in sequence length n — vs FlashAttention's quadratic n² — with matching I/O lower bounds proving the result is near-optimal.
NVIDIA Vera Rubin NVL72 — Rack-scale NVLink domain — What does it mean?
Vera Rubin NVL72 wires all 72 Rubin GPUs in a rack into one sixth-gen NVLink domain — collectives no longer cross PCIe-over-network between 8-GPU islands.
LongLive-2.0 — NVFP4 W4A4 across training and inference — What does it mean?
NVIDIA's LongLive-2.0 runs training AND inference of a 5B long-video model in NVFP4 — W4A4 matmul plus a 4-bit KV cache — for 2.15× training and 1.84× inference speedup.



