AI Explained
Plain explanations of trending AI concepts, with live visualizations.
LLM2026-05-06
DeepSeek V4-Pro and V4-Flash — long-context cost cut to a fraction — What does it mean?
V4-Pro and V4-Flash drop both per-token FLOPs and KV cache to roughly 7-27% of V3.2 at 1M context — same cluster, ~10-14× more concurrent users.
LLM2026-05-06
CoPD paper — Reinforcement Learning with Verifiable Rewards (RLVR) — What does it mean?
RLVR is post-training where a tiny verifier — unit tests, equality checks, proof assistant — replaces the learned reward model. Reward = 0 or 1, depending on whether the answer checks out.
LLM2026-05-06
CoPD paper — Co-evolving Policy Distillation between parallel experts — What does it mean?
CoPD trains N specialist LLMs in parallel as mutual teachers — every step distills on-policy from current peers, beating RLVR + frozen-expert OPD.