AI Explained

Plain explanations of trending AI concepts, with live visualizations.

Agent

OSWorld2.0 benchmark: best computer-use agent finishes just 20.6% of tasks — Long-horizon computer-use failure modes — What does it mean?

OSWorld2.0 runs computer-use agents through 108 long real-world tasks. The best finishes just 20.6%, undone by four failure modes.

LLM

MultiHashFormer drops the vocab-sized embedding table — Hash-signature token representation — What does it mean?

A new LM names each token by a short multi-hash signature instead of a vocab-sized embedding row, decoupling parameters from vocabulary size.

Agent

Agents-A1 matches trillion-param agents at 35B — Scaling the horizon, not the parameters — What does it mean?

Agents-A1, a 35B agent, matches trillion-param systems by training on long ~45K-token task runs — scaling the horizon, not the parameter count.

Agent

Agents struggle to know when to stop — Agentic abstention — What does it mean?

Agentic abstention is knowing the right moment for an agent to stop acting under uncertainty — a skill agents mis-time, and one CONVOLVE adds without retraining.

Agent

Ornith-1.0 ships open MIT-licensed coding models — Self-scaffolding RL — What does it mean?

Ornith-1.0's coding models learn to write their own RL training scaffold per task, then solve against it — with safety pushed outside the model.

LLM

Cluster-Route-Escalate cascade serves LLMs at 97-99% accuracy for less cost — Cost-aware LLM cascade — What does it mean?

A cascade routes each query to the cheapest capable model and escalates only weak answers to a stronger one: 97-99% accuracy at lower cost.

LLM

SGLang v0.5.14 — LPLB expert-parallel load balancing — What does it mean?

SGLang v0.5.14's LPLB solves a tiny linear program each step to even MoE token load across GPUs, so the busiest GPU stops gating throughput.

LLM

ViQ: text-aligned visual tokens, quantized at any image resolution — Text-aligned quantized visual tokens vs continuous patches — What does it mean?

ViQ turns images into discrete, text-aligned visual tokens — like a fixed vocabulary of labeled stamps — so a multimodal LLM reads pictures the way it reads words, at any resolution.

LLM

RL data scheduler hits target perplexity with 44% fewer pretraining steps — RL-learned data mixture vs fixed pretraining blend — What does it mean?

An RL agent adjusts an LLM's pretraining data mixture on the fly instead of using a fixed blend, hitting target perplexity with 44% fewer steps.

Agent

NatureBench: coding agents beat Nature-paper SOTA on just 17.8% of tasks — Discovery vs reproduction agent benchmarking — What does it mean?

NatureBench scores coding agents on beating published SOTA from 90 Nature-paper tasks, not reproducing it — the best agent wins on only 17.8%.

LLM

InfoKV: entropy-aware KV-cache compression keeps long-context recall — Forward Influence — What does it mean?

InfoKV compresses the KV cache by a token's predictive uncertainty, not attention alone — keeping the unsure tokens that steer distant context.

Agent

AOHP runs agents as OS actors on Android: +21% tasks, -52% tokens — Agents as first-class OS actors — What does it mean?

AOHP makes AI agents privileged OS-level actors on Android — acting across apps through machine-friendly interfaces — lifting task completion 21% and cutting tokens 52%.