AI Explained

Plain explanations of trending AI concepts, with live visualizations.

Diversity-driven RL keeps a model's solution strategies wide instead of collapsing onto a few — how a 3B model reaches 94.3 on AIME 2026.

FastContext trains a separate read-only explorer subagent that finds code and returns citations, cutting a coding agent's tokens up to 60%.

AdaSR trains LLMs to reason while input still streams in, then deliberate once it lands — split-phase RL credit under a latency-aware reward.

A latent failure is a plan that runs to the end without erroring and still silently fails the goal — SIMMER finds up to 56% of LLM plans hide one.

A fused Triton kernel keeps INT8 matmuls on the tensor cores end to end, so W8A8 finally beats FP8 on a consumer GPU — no dequant round trip.

CacheRL replaces live tool execution during RL rollouts with a three-tier fuzzy cache — 92% process accuracy vs GPT-5's 94% at ~100× less compute.

HarnessBridge replaces the hand-built agent harness with a learnable module — two projections that distill state and vet each action.

EvoMem keeps agent memory as a changelog of structured patches — what changed and when — so the agent can reason about how its world evolved.

AgentPerf grades serving systems by replaying real multi-step agent runs, not single prompts — and Blackwell's GB300 leads on agents per megawatt.

WeaveBench finds the best computer-use agent clears 41.2% — and a trajectory-aware judge shows outcome-only grading flatters the rest.

VIA-SD adds a confidence-gated middle tier to speculative decoding — close calls go to a slim sub-network of the same model, not a full re-run.

SpatialClaw makes a VLM agent's actions executable Python cells on a stateful kernel — observe-then-act beats rigid tool-calls, +11.2 pts to 59.9%.