AI Explained

Plain explanations of trending AI concepts, with live visualizations.

GPU

I/O-optimal approximate attention — Near-linear I/O vs FlashAttention — What does it mean?

A new paper derives approximate-attention algorithms whose I/O between SRAM and HBM scales near-linearly in sequence length n — vs FlashAttention's quadratic n² — with matching I/O lower bounds proving the result is near-optimal.

LLM

Complete-muE paper — Two-bridge muTransfer for MoE — What does it mean?

Complete-muE extends muTransfer to MoE via two bridges — active-width scaling (dense → activated width) and activated-expert scaling (across MoE shapes) — so one dense sweep transfers near-optimally to any expert count and top-k.

LLM

Cursor Composer 2.5 — Targeted textual feedback RL — What does it mean?

Cursor Composer 2.5 ships targeted textual feedback RL — a constructed short hint at a specific span in a long agent rollout becomes a teacher distribution, and on-policy distillation KL replaces the diffuse end-of-rollout scalar with a localized credit signal.

LLM

VPO paper — Vector-reward advantage vs GRPO scalar collapse — What does it mean?

VPO replaces GRPO's scalar advantage estimator with a vector-valued one — so the post-trained policy keeps a diverse solution distribution that pays off at pass@k and best@k as the search budget grows.

Agent

OpenSCAD Pantheon benchmark — Human-in-the-loop vs autonomous coding agents — What does it mean?

ModelRift pitted 6 agentic coding tools against the same OpenSCAD Pantheon task — Antigravity 2.0 in autonomous mode won at 4.5/5 in ~12 min while ModelRift's human-in-the-loop tier hit 3.8/5 in ~10 min.

Agent

MCP 2026-07-28 RC — stateless transport — What does it mean?

MCP's 2026-07-28 RC reworks transport so every tools/call request carries its own routing data — any server in the fleet can serve it, no sticky session pin required.

Agent

Maestro paper — RL orchestrator over frozen experts — What does it mean?

Maestro trains a 4B RL policy that picks (expert, skill) per task from a frozen model pool — 70.1% across 10 multimodal benchmarks, beating GPT-5 (69.3%), and routes to new experts without retraining.

Agent

Boiling the Frog paper — Multi-turn norm erosion vs single-prompt agent safety — What does it mean?

Boiling the Frog walks agents from benign edits to risk-bearing actions across multiple turns — averaging 44.4% attack success on 9 frontier agents that would refuse the same final message asked alone.

LLM

Gated DeltaNet-2 paper — Decoupled channel-wise erase/write gates — What does it mean?

Gated DeltaNet-2 replaces the single scalar gate of prior linear-attention models with two independent per-channel vectors — one for erasing old state, one for writing new state — so each dimension of the recurrent memory can decay at its own rate.

Agent

Camouflage Injection paper — Camouflage Detection Gap — What does it mean?

Injection payloads rewritten in a document's own domain vocabulary slip past current detectors: Llama 3.1 8B drops from 93.8% to 9.7% caught, Llama Guard 3 catches zero.

LLM

ACC paper — Tool-output unmasking — What does it mean?

ACC reformats agent trajectories as long-context QA pairs by unmasking tool outputs — Qwen3-30B-A3B gains +18.1 MRCR, matching Qwen3-235B-A22B 8× smaller.

GPU

NVIDIA Vera Rubin NVL72 — Rack-scale NVLink domain — What does it mean?

Vera Rubin NVL72 wires all 72 Rubin GPUs in a rack into one sixth-gen NVLink domain — collectives no longer cross PCIe-over-network between 8-GPU islands.