AI Explained

Plain explanations of trending AI concepts, with live visualizations.

LLM

Google's Gemini Omni — Modality unification in a shared token space — What does it mean?

Gemini Omni turns text, image, audio, and video into tokens in one shared space, so a single model can read — and generate — any modality.

LLM

Parametric Memory Law links LoRA capacity to verbatim recall — The p > 0.5 recall threshold — What does it mean?

A finetune memorizes a token verbatim once its greedy probability crosses 0.5 — and a power law says how much LoRA capacity that takes.

GPU

NVIDIA AI Factories — Tokens-per-megawatt as a serving metric — What does it mean?

NVIDIA's 'AI Factories' framing reorganizes datacenter economics around tokens per megawatt — bundling compute, memory, interconnect, and orchestration into one billable knob and claiming ~50× tokens/MW for Blackwell Ultra GB300 NVL72 vs Hopper.

LLM

MobileMoE — DRAM-aware MoE scaling for sub-3GB devices — What does it mean?

MobileMoE introduces a joint memory + compute scaling law for on-device MoE LMs; S/M/L fit under 3 GB at INT4 and reportedly run 1.8–3.8× faster prefill / 2.2–3.4× faster decode than dense baselines on Galaxy S25 and iPhone 16 Pro.

LLM

Gemini 3.5 Flash — Agent-first model design — What does it mean?

Some LLMs are trained from day one to live inside an agent loop — calling tools, recovering from errors — instead of being chat models with tool-calling bolted on later.

Agent

AgentDoG 1.5 — Small inline guard models for agent actions — What does it mean?

A tiny 0.8–8B model that screens every agent action inline — reportedly matching a closed safety model's catch rate at ~100× less deploy overhead.

GPU

NVIDIA Jetson Thor — Edge Blackwell vs datacenter Blackwell — What does it mean?

Jetson Thor is NVIDIA's Blackwell-architecture edge AI module — 2,070 FP4 TFLOPS in a 40–130W envelope, reportedly 7.5× compute and 3.5× per-watt vs Jetson Orin (Computex 2026).

Agent

Anthropic's Project Glasswing — Detection-saturated vulnerability pipeline — What does it mean?

Glasswing partners found 10,000+ high- or critical-severity vulnerabilities in one month. The detection-saturated dynamic is sharpest where the patcher is a separate org — Anthropic's OSS disclosures: 530 reported, only 75 patched after a month.

Agent

PromptArmor × Copilot Cowork — Image-URL exfiltration in agent UIs — What does it mean?

PromptArmor showed Microsoft Copilot Cowork posting a Teams message whose hidden <img src> leaks a pre-authenticated OneDrive token the moment the user opens the inbox — zero clicks on a malicious link.

LLM

ThriftAttention paper — Importance-aware FP16/FP4 mixed-precision attention — What does it mean?

ThriftAttention runs ~5% of QK attention blocks (the top by a cheap importance score) in FP16 and the rest in FP4, recovering 89.1% of the FP4→FP16 long-context quality gap.

Agent

PushBench paper — Quantitative Goal Persistence (QGP) — What does it mean?

PushBench measures Quantitative Goal Persistence — frontier models hold up at 50 artifacts but drop to 3/9 successes at 100; a state-tracking controller restores 69–78% by rejecting duplicate submissions.

LLM

Distillation in LLM pre-training — Non-monotonic teacher strength — What does it mean?

Sweeping teacher and student sizes in pre-training distillation shows the helpfulness of a teacher peaks at small undertrained models — stronger teachers can hurt, and the gains land out-of-domain, not in-domain.