AI Explained
Plain explanations of trending AI concepts, with live visualizations.
PEFT scaling paper — Persistent personal adapters at million-scale — What does it mean?
Reframes a LoRA adapter from a cost-cutting trick into persistent per-user state — a million personal adapters served over one frozen base.
LongTraceRL — Rubric reward (entity-level process supervision) — What does it mean?
LongTraceRL scores each reasoning hop, not just the final answer — dense process reward, gated to correct rollouts so it cannot be gamed.
Harness-1 — State-externalizing search harness — What does it mean?
Harness-1 is a 20B search agent that keeps working memory in an external harness, not a growing transcript — so context stays flat as the search deepens.
GrepSeek trains a search agent to use shell commands — GRPO-trained shell-command search — What does it mean?
GrepSeek trains an agent to search a raw corpus with shell commands via a Tutor/Planner distillation then GRPO — index-free agentic retrieval.
dMoE cuts diffusion-LLM MoE memory ~80% — block-level expert routing — What does it mean?
dMoE pools a diffusion block's per-token expert choices into one block-level decision — ~70→15 experts loaded, ~80% less memory.
COLLEAGUE.SKILL — Capability vs behavior skill tracks — What does it mean?
COLLEAGUE.SKILL turns one expert trace into a versioned skill package: a capability track (what to do) plus a behavior track (how to do it).
Agent-harness scaling law: feedback quality predicts success, not raw compute — Effective Feedback Compute (EFC) — What does it mean?
Effective Feedback Compute (EFC) predicts agent-harness success from feedback quality, not raw compute — far tighter than spend does.
Parallax — Local-linear attention vs FlashAttention 2/3 — What does it mean?
Parallax upgrades softmax attention from a flat local average to a slope-aware fit — sharper, and compute-bound past FlashAttention 2/3.
Claude Opus 4.8 — Parallel-subagent dynamic workflows — What does it mean?
Opus 4.8 'dynamic workflows' let Claude Code run parallel subagents, so wall-clock is set by the slowest subtask, not the sum.
Claude Opus 4.8 — Cache-preserving mid-task system messages — What does it mean?
Opus 4.8 can inject a system message mid-conversation without busting the prompt cache — so the cached prefix is reused, not recomputed.
OmniRetrieval — Source-native query dispatch — What does it mean?
OmniRetrieval routes each query to text, tables, or graphs natively — so JOINs and graph edges survive instead of collapsing into one flat vector index.
MarginGate — Margin-gated verification for batch-invariant decoding — What does it mean?
Why temp-0 BF16 decoding emits different tokens in a batch — and how MarginGate restores determinism by re-checking only the risky steps.