AI Explained

Plain explanations of trending AI concepts, with live visualizations.

Tangram sizes each attention head's KV cache to what it actually keeps, not one uniform budget — lifting multi-turn serving up to 2.6×.

Gemma 4 ships QAT 4-bit checkpoints: training on the low-bit grid dodges the accuracy cliff that naive post-training rounding hits.

Gated DeltaNet's chunk solve hides a sequential matrix inverse — a truncated Neumann series turns it into parallel MatMuls, ~5x faster.

AutoLab grades agents on the propose → run → measure → refine loop; across 17 models, sustained iteration — not the first answer — predicted success.

Token Budgets models an agent's token cap as a use-at-most-once resource — so a budget overrun fails to compile instead of overspending at runtime.

TELBench + DRIFT pinpoint which step of a deep-research agent's trajectory made the answer unreliable — span-level error localization, up to +30 pp.

StreamMA streams each reasoning step between agents — B starts on A's reliable early steps, not the risky last one. +7.3pp avg accuracy.

RTX Spark wires a Grace CPU and a Blackwell GPU to one 128GB pool over NVLink-C2C, so the GPU skips the PCIe host–device copy.

MAI-Code-1-Flash scales how many reasoning tokens it spends to each task's difficulty — short chains for easy tasks, long only for hard ones.

KVarN rotates outliers out of the KV cache so 2-bit quantization fits every channel — calibration-free, no error compounding across decode.

Crafter's directive critic emits per-dimension fixes + typed edits, not a scalar score — a harness that lifts figures 33.73 → 50.34.

Averaging the output distributions of 3–5 independent LLMs cancels each one's text watermark — detection z-scores fall from 5–300 to below 2.