AI Explained

Plain explanations of trending AI concepts, with live visualizations.

LLM

VibeThinker-3B hits 94.3 on AIME 2026 — Diversity-driven RL — What does it mean?

Diversity-driven RL keeps a model's solution strategies wide instead of collapsing onto a few — how a 3B model reaches 94.3 on AIME 2026.

Agent

Microsoft FastContext: a repo-explorer subagent cuts coding-agent tokens 60% — Explorer-subagent context offloading — What does it mean?

FastContext trains a separate read-only explorer subagent that finds code and returns citations, cutting a coding agent's tokens up to 60%.

LLM

AdaSR teaches LLMs to reason mid-stream — Streaming reasoning — What does it mean?

AdaSR trains LLMs to reason while input still streams in, then deliberate once it lands — split-phase RL credit under a latency-aware reward.

Agent

SIMMER: 56% of frontier-LLM plans hide latent failures — Latent failures in planning — What does it mean?

A latent failure is a plan that runs to the end without erroring and still silently fails the goal — SIMMER finds up to 56% of LLM plans hide one.

GPU

INT8 finally beats FP8 on consumer GPUs — Fused INT8 GEMM kernel — What does it mean?

A fused Triton kernel keeps INT8 matmuls on the tensor cores end to end, so W8A8 finally beats FP8 on a consumer GPU — no dequant round trip.

Agent

CacheRL trains tool-calling agents via cached rollouts at 100× less compute — Cached rollouts for agent RL — What does it mean?

CacheRL replaces live tool execution during RL rollouts with a three-tier fuzzy cache — 92% process accuracy vs GPT-5's 94% at ~100× less compute.

Agent

HarnessBridge — Learned agent harness vs hand-engineered — What does it mean?

HarnessBridge replaces the hand-built agent harness with a learnable module — two projections that distill state and vet each action.

Agent

EvoArena + EvoMem — Patch-based agent memory — What does it mean?

EvoMem keeps agent memory as a changelog of structured patches — what changed and when — so the agent can reason about how its world evolved.

Agent

NVIDIA Blackwell leads AgentPerf, the first agentic-AI infra benchmark — Trajectory-replay benchmarking — What does it mean?

AgentPerf grades serving systems by replaying real multi-step agent runs, not single prompts — and Blackwell's GB300 leads on agents per megawatt.

Agent

WeaveBench: best computer-use agent clears just 41% — Trajectory-aware vs outcome-only grading — What does it mean?

WeaveBench finds the best computer-use agent clears 41.2% — and a trajectory-aware judge shows outcome-only grading flatters the rest.

LLM

VIA-SD speeds up speculative decoding 10–20% — Tiered confidence-gated verification — What does it mean?

VIA-SD adds a confidence-gated middle tier to speculative decoding — close calls go to a slim sub-network of the same model, not a full re-run.

Agent

SpatialClaw lifts agent spatial reasoning to 59.9% — Code-as-action vs structured tool-calls — What does it mean?

SpatialClaw makes a VLM agent's actions executable Python cells on a stateful kernel — observe-then-act beats rigid tool-calls, +11.2 pts to 59.9%.