AI Explained

Plain explanations of trending AI concepts, with live visualizations.

GPU2026-06-26

OpenAI and Broadcom's Jalapeño, a custom inference ASIC — Inference ASIC vs GPU — What does it mean?

OpenAI and Broadcom's Jalapeño runs LLM inference only, trading a GPU's flexibility for a shorter, faster path from memory to compute.

GPU2026-06-21

UFP4 fixes FP4 pretraining's shrinkage bias — E2M1 shrinkage bias — What does it mean?

E2M1's lopsided 4-bit bins round values toward zero — a shrinkage bias UFP4 fixes with a Hadamard transform + stochastic rounding.

GPU2026-06-19

NVIDIA Blackwell sweeps MLPerf Training 6.0 — Strong scaling — What does it mean?

Strong scaling asks if 2× the GPUs really halves training time. MLPerf 6.0: 8,192 Blackwell GPUs trained DeepSeek-V3 671B to target in 2.02 min.

GPU2026-06-15

INT8 finally beats FP8 on consumer GPUs — Fused INT8 GEMM kernel — What does it mean?

A fused Triton kernel keeps INT8 matmuls on the tensor cores end to end, so W8A8 finally beats FP8 on a consumer GPU — no dequant round trip.

GPU2026-06-03

NVIDIA RTX Spark superchip — Unified CPU–GPU memory — What does it mean?

RTX Spark wires a Grace CPU and a Blackwell GPU to one 128GB pool over NVLink-C2C, so the GPU skips the PCIe host–device copy.

GPU2026-05-29

NVIDIA AI Factories — Tokens-per-megawatt as a serving metric — What does it mean?

NVIDIA's 'AI Factories' framing reorganizes datacenter economics around tokens per megawatt — bundling compute, memory, interconnect, and orchestration into one billable knob and claiming ~50× tokens/MW for Blackwell Ultra GB300 NVL72 vs Hopper.

GPU2026-05-27

NVIDIA Jetson Thor — Edge Blackwell vs datacenter Blackwell — What does it mean?

Jetson Thor is NVIDIA's Blackwell-architecture edge AI module — 2,070 FP4 TFLOPS in a 40–130W envelope, reportedly 7.5× compute and 3.5× per-watt vs Jetson Orin (Computex 2026).

GPU2026-05-26

I/O-optimal approximate attention — Near-linear I/O vs FlashAttention — What does it mean?

A new paper derives approximate-attention algorithms whose I/O between SRAM and HBM scales near-linearly in sequence length n — vs FlashAttention's quadratic n² — with matching I/O lower bounds proving the result is near-optimal.

GPU2026-05-22

NVIDIA Vera Rubin NVL72 — Rack-scale NVLink domain — What does it mean?

Vera Rubin NVL72 wires all 72 Rubin GPUs in a rack into one sixth-gen NVLink domain — collectives no longer cross PCIe-over-network between 8-GPU islands.

GPU2026-05-20

LongLive-2.0 — NVFP4 W4A4 across training and inference — What does it mean?

NVIDIA's LongLive-2.0 runs training AND inference of a 5B long-video model in NVFP4 — W4A4 matmul plus a 4-bit KV cache — for 2.15× training and 1.84× inference speedup.