AI Explained

Plain explanations of trending AI concepts, with live visualizations.

LLM2026-05-06

DeepSeek V4-Pro and V4-Flash — long-context cost cut to a fraction — What does it mean?

V4-Pro and V4-Flash drop both per-token FLOPs and KV cache to roughly 7-27% of V3.2 at 1M context — same cluster, ~10-14× more concurrent users.

LLM2026-05-06

CoPD paper — Reinforcement Learning with Verifiable Rewards (RLVR) — What does it mean?

RLVR is post-training where a tiny verifier — unit tests, equality checks, proof assistant — replaces the learned reward model. Reward = 0 or 1, depending on whether the answer checks out.

LLM2026-05-06

CoPD paper — Co-evolving Policy Distillation between parallel experts — What does it mean?

CoPD trains N specialist LLMs in parallel as mutual teachers — every step distills on-policy from current peers, beating RLVR + frozen-expert OPD.