AI Explained

Plain explanations of trending AI concepts, with live visualizations.

LLM

PEFT scaling paper — Persistent personal adapters at million-scale — What does it mean?

Reframes a LoRA adapter from a cost-cutting trick into persistent per-user state — a million personal adapters served over one frozen base.

LLM

LongTraceRL — Rubric reward (entity-level process supervision) — What does it mean?

LongTraceRL scores each reasoning hop, not just the final answer — dense process reward, gated to correct rollouts so it cannot be gamed.

Agent

Harness-1 — State-externalizing search harness — What does it mean?

Harness-1 is a 20B search agent that keeps working memory in an external harness, not a growing transcript — so context stays flat as the search deepens.

Agent

GrepSeek trains a search agent to use shell commands — GRPO-trained shell-command search — What does it mean?

GrepSeek trains an agent to search a raw corpus with shell commands via a Tutor/Planner distillation then GRPO — index-free agentic retrieval.

LLM

dMoE cuts diffusion-LLM MoE memory ~80% — block-level expert routing — What does it mean?

dMoE pools a diffusion block's per-token expert choices into one block-level decision — ~70→15 experts loaded, ~80% less memory.

Agent

COLLEAGUE.SKILL — Capability vs behavior skill tracks — What does it mean?

COLLEAGUE.SKILL turns one expert trace into a versioned skill package: a capability track (what to do) plus a behavior track (how to do it).

Agent

Agent-harness scaling law: feedback quality predicts success, not raw compute — Effective Feedback Compute (EFC) — What does it mean?

Effective Feedback Compute (EFC) predicts agent-harness success from feedback quality, not raw compute — far tighter than spend does.

LLM

Parallax — Local-linear attention vs FlashAttention 2/3 — What does it mean?

Parallax upgrades softmax attention from a flat local average to a slope-aware fit — sharper, and compute-bound past FlashAttention 2/3.

Agent

Claude Opus 4.8 — Parallel-subagent dynamic workflows — What does it mean?

Opus 4.8 'dynamic workflows' let Claude Code run parallel subagents, so wall-clock is set by the slowest subtask, not the sum.

LLM

Claude Opus 4.8 — Cache-preserving mid-task system messages — What does it mean?

Opus 4.8 can inject a system message mid-conversation without busting the prompt cache — so the cached prefix is reused, not recomputed.

Agent

OmniRetrieval — Source-native query dispatch — What does it mean?

OmniRetrieval routes each query to text, tables, or graphs natively — so JOINs and graph edges survive instead of collapsing into one flat vector index.

LLM

MarginGate — Margin-gated verification for batch-invariant decoding — What does it mean?

Why temp-0 BF16 decoding emits different tokens in a batch — and how MarginGate restores determinism by re-checking only the risky steps.