AI Explained

Plain explanations of trending AI concepts, with live visualizations.

LLM

Attention Amnesia: CoT fine-tuning wrecks long-range recall in hybrid LLMs — Training-free QK-Restore — What does it mean?

Fine-tuning a hybrid LLM to reason silently breaks long-range recall; QK-Restore rolls back just the Q/K projections to recover it — no retraining.

LLM

Latent Context LMs compress prompts 16x — Encoder-decoder prompt compression — What does it mean?

Latent Context LMs use a small encoder to squeeze a long prompt into a 16x shorter sequence of latent embeddings a decoder reads as tokens.

LLM

Google releases Gemma 4 12B — Encoder-free multimodal projection — What does it mean?

Gemma 4 12B drops the separate vision and audio encoders — image patches and audio go straight into the token stream, no ViT.

LLM

FlashMemory cuts DeepSeek-V4's KV cache to 13.5% — Lookahead Sparse Attention — What does it mean?

FlashMemory's Lookahead Sparse Attention trains a small indexer to keep only the KV-cache chunks a token will use — shrinking the physical cache to 13.5%.

LLM

Chiaroscuro Attention cuts attention FLOPs 62% — Spectral-entropy token routing — What does it mean?

Chiaroscuro scores each token's spectral entropy and sends most tokens through a cheap frequency-domain mixer, paying full attention only for the few.

LLM

SigmaScale learns its SVD scaling matrices — Learned scaling for truncated-SVD compression — What does it mean?

SigmaScale learns two scaling vectors under an activation-aware loss so truncated-SVD throws away less — shrinking LLM weights by rank, not bits.

LLM

MiniMax M3 ships open-weight 1M context — MiniMax Sparse Attention (MSA) — What does it mean?

MiniMax M3 (open-weight, 1M context) runs on MiniMax Sparse Attention — block-sparse KV gather that cuts per-token compute ~20× at one million tokens.

LLM

EmbedFilter — Unembedding matrix as a feature lens — What does it mean?

LLMs make blurry text embeddings because a subspace in their unembedding matrix injects frequent words — EmbedFilter projects it out in one linear transform.

Agent

Self-evolving agents collapse over iterations — Continual experience internalization — What does it mean?

Self-evolving agents can degrade as they learn from their own runs. Three design choices decide whether they keep improving or collapse.

Agent

MLEvolve: self-evolving agents beat AlphaEvolve — Progressive Monte Carlo Graph Search — What does it mean?

Progressive Monte Carlo Graph Search lets MLEvolve share discoveries across branches — SOTA on MLE-Bench in half the usual budget.

LLM

Code2LoRA gives code models per-repo knowledge — Hypernetwork-generated LoRA adapters — What does it mean?

Code2LoRA's hypernetwork stamps out a repo-specific LoRA adapter in one pass — repo knowledge with zero prompt tokens at inference.

Agent

AdaPlanBench tests agent planning under incremental constraints — Adaptive replanning under hidden constraints — What does it mean?

AdaPlanBench hides each task's rules until a plan violates one — the best of 10 LLMs clears just 67.75%, and user constraints are harder than world constraints.