AI Explained

Plain explanations of trending AI concepts, with live visualizations.

HarnessBridge replaces the hand-built agent harness with a learnable module — two projections that distill state and vet each action.

EvoMem keeps agent memory as a changelog of structured patches — what changed and when — so the agent can reason about how its world evolved.

AgentPerf grades serving systems by replaying real multi-step agent runs, not single prompts — and Blackwell's GB300 leads on agents per megawatt.

WeaveBench finds the best computer-use agent clears 41.2% — and a trajectory-aware judge shows outcome-only grading flatters the rest.

SpatialClaw makes a VLM agent's actions executable Python cells on a stateful kernel — observe-then-act beats rigid tool-calls, +11.2 pts to 59.9%.

A survey reframes building an agent's training world as engineering — and its sharpest split is hand-coded vs model-generated environments.

Workflow-GYM drops computer-use agents into real pro software and grades the whole multi-stage job end to end — SOTA clears only ~30%.

Role-Agent trains an agent by making one LLM play both the agent and the world it acts in — no external environment, no separate reward model.

SearchSwarm bakes task decomposition and subagent delegation into a 30B model's weights via SFT — not prompts — and tops BrowseComp.

Fable 5 ships a frontier model to everyone by routing under 5% of sensitive requests to a more conservative model instead of weakening it.

Self-evolving agents can degrade as they learn from their own runs. Three design choices decide whether they keep improving or collapse.

Progressive Monte Carlo Graph Search lets MLEvolve share discoveries across branches — SOTA on MLE-Bench in half the usual budget.