AI Explained
Plain explanations of trending AI concepts, with live visualizations.
Google's Gemini Omni — Modality unification in a shared token space — What does it mean?
Gemini Omni turns text, image, audio, and video into tokens in one shared space, so a single model can read — and generate — any modality.
Parametric Memory Law links LoRA capacity to verbatim recall — The p > 0.5 recall threshold — What does it mean?
A finetune memorizes a token verbatim once its greedy probability crosses 0.5 — and a power law says how much LoRA capacity that takes.
NVIDIA AI Factories — Tokens-per-megawatt as a serving metric — What does it mean?
NVIDIA's 'AI Factories' framing reorganizes datacenter economics around tokens per megawatt — bundling compute, memory, interconnect, and orchestration into one billable knob and claiming ~50× tokens/MW for Blackwell Ultra GB300 NVL72 vs Hopper.
MobileMoE — DRAM-aware MoE scaling for sub-3GB devices — What does it mean?
MobileMoE introduces a joint memory + compute scaling law for on-device MoE LMs; S/M/L fit under 3 GB at INT4 and reportedly run 1.8–3.8× faster prefill / 2.2–3.4× faster decode than dense baselines on Galaxy S25 and iPhone 16 Pro.
Gemini 3.5 Flash — Agent-first model design — What does it mean?
Some LLMs are trained from day one to live inside an agent loop — calling tools, recovering from errors — instead of being chat models with tool-calling bolted on later.
AgentDoG 1.5 — Small inline guard models for agent actions — What does it mean?
A tiny 0.8–8B model that screens every agent action inline — reportedly matching a closed safety model's catch rate at ~100× less deploy overhead.
NVIDIA Jetson Thor — Edge Blackwell vs datacenter Blackwell — What does it mean?
Jetson Thor is NVIDIA's Blackwell-architecture edge AI module — 2,070 FP4 TFLOPS in a 40–130W envelope, reportedly 7.5× compute and 3.5× per-watt vs Jetson Orin (Computex 2026).
Anthropic's Project Glasswing — Detection-saturated vulnerability pipeline — What does it mean?
Glasswing partners found 10,000+ high- or critical-severity vulnerabilities in one month. The detection-saturated dynamic is sharpest where the patcher is a separate org — Anthropic's OSS disclosures: 530 reported, only 75 patched after a month.
PromptArmor × Copilot Cowork — Image-URL exfiltration in agent UIs — What does it mean?
PromptArmor showed Microsoft Copilot Cowork posting a Teams message whose hidden <img src> leaks a pre-authenticated OneDrive token the moment the user opens the inbox — zero clicks on a malicious link.
ThriftAttention paper — Importance-aware FP16/FP4 mixed-precision attention — What does it mean?
ThriftAttention runs ~5% of QK attention blocks (the top by a cheap importance score) in FP16 and the rest in FP4, recovering 89.1% of the FP4→FP16 long-context quality gap.
PushBench paper — Quantitative Goal Persistence (QGP) — What does it mean?
PushBench measures Quantitative Goal Persistence — frontier models hold up at 50 artifacts but drop to 3/9 successes at 100; a state-tracking controller restores 69–78% by rejecting duplicate submissions.
Distillation in LLM pre-training — Non-monotonic teacher strength — What does it mean?
Sweeping teacher and student sizes in pre-training distillation shows the helpfulness of a teacher peaks at small undertrained models — stronger teachers can hurt, and the gains land out-of-domain, not in-domain.
