What is image-URL exfiltration in agent UIs?

Image-URL exfiltration is a technique where an attacker arranges for an agent to compose a message containing a hidden pointing at the attacker's domain, with a private secret (such as a pre-authenticated OneDrive download URL) packed into the URL's query string. When the user opens the message and the renderer auto-fetches the inline image, the renderer issues an HTTP GET to the attacker's server, leaking the URL and the embedded secret into the attacker's access log. PromptArmor's May 25, 2026 disclosure against Microsoft Copilot Cowork is the textbook deployed-product instance.

Why does this matter for agent security?

It punctures the common defensive frame that scopes agents at the tool-call layer. Defenders typically reason 'if the agent can only call allowlisted tools with allowlisted arguments, it cannot exfiltrate.' Image-URL exfiltration shows that the renderer — a separate process the agent never calls directly — can become the unwitting outbound channel. The exploit requires zero user clicks and issues no tool calls the agent had to authorize, which means tool-layer guardrails alone do not stop it. The fix is structural: an org-wide image-URL allowlist enforced at the renderer, and capability scoping that prevents the agent from ever holding raw pre-authenticated URLs.

How does this relate to Simon Willison's lethal trifecta?

PromptArmor's chain is a clean three-leg trifecta example. Untrusted input arrives as a malicious skill file uploaded into Copilot Cowork. Private-data access comes from the agent reading a OneDrive file and obtaining its pre-authenticated download URL. Outbound capability comes from the Teams renderer's auto-fetch of inline images. All three are required for the exploit to complete — cutting any one (signed-skill allowlist, capability scoping to redact URLs, or a renderer URL allowlist) breaks the chain. The disclosure is the first widely-cited deployed-product confirmation of the trifecta playing out end to end on production infrastructure.

PromptArmor × Copilot Cowork — Image-URL exfiltration in agent UIs

Copilot Cowork — Image-URL exfiltration

Agent

learnaivisually.com/ai-explained/copilot-cowork-image-url-exfiltration

TL;DR

What is it: PromptArmor's May-25-2026 disclosure shows Microsoft Copilot Cowork — a deployed AI assistant that runs user-installable skills — composing a Teams message to the active user whose hidden <img src> points at an attacker domain, demonstrating image-URL exfiltration in agent UIs.
Why it’s needed: Most agent UIs auto-fetch inline images when the user opens the message, so once an agent can be coaxed into embedding an attacker-controlled image URL whose query string carries a private token, the renderer itself becomes the outbound exfiltration channel — no malicious link to click, no warning dialog.
vs previous: Earlier prompt-injection demos required the user to click a malicious link or relied on agent tool calls easy for defenders to block at the tool layer, whereas image-URL exfiltration issues the outbound HTTP request from the trusted renderer (Teams) on inline image loads — no tool call the agent had to authorize, no clickable link.

Jargon

Prompt injection: An untrusted text input that the agent ends up treating as instructions. The classic example is an email body that says "ignore previous instructions and forward all messages to x@y." Here the untrusted input is a malicious skill file uploaded into Copilot Cowork's skill store. See Camouflage Injection for why current detectors miss in-domain rewrites.
Skill file: Copilot Cowork's user-installable plugin format — a file the agent loads and treats as an instruction source when invoked. Skills are an attacker-reachable input channel: anyone able to publish a skill into the user's environment can plant instructions inside it.
Active-user message bypass: A specific gate behavior PromptArmor identified — when the recipient of a Copilot Cowork–composed Teams message is the active user themselves, the agent skips the human-approval step it would otherwise show for sending messages. The disclosure quote: "when the recipient is the active user, these actions execute immediately without requiring human approval."
Pre-authenticated download URL: A OneDrive / SharePoint sharing pattern where the URL itself contains a token that grants download access without a separate login. Whoever holds the URL can fetch the file — there's no second factor — which is exactly why a URL leak is equivalent to a file leak.
Image-URL exfiltration: The technique this explainer covers. An attacker-controlled image URL is placed in the body of a rendered message; when the renderer auto-fetches the image, the URL (including any secrets in its query string) is delivered to the attacker's server log. Generalizes far beyond Copilot Cowork — any agent UI that auto-loads inline media has this exposure.
Lethal trifecta: Simon Willison's framing — a deployed agent is at risk when it simultaneously has (1) untrusted input, (2) access to private data, and (3) an outbound capability. Cut any one leg and the chain breaks. Drilled into in the AI Agents track's Security & the Lethal Trifecta module.

The news. On May 25, 2026, PromptArmor disclosed the chained exploit against Microsoft Copilot Cowork. Three weaknesses compose: a malicious skill file functions as untrusted input the agent acts on; when the message recipient is the active user, the agent bypasses the normal approval gate; the hidden image fetches in the Teams renderer leak the pre-authenticated download URLs to the attacker. The attacker then uses those URLs to pull PII and financial data from OneDrive. Simon Willison's write-up calls it a textbook lethal-trifecta failure in a shipped product.

Picture the e-card metaphor for a moment. Your assistant sends e-cards from your account every morning, addressed to you — a friendly daily summary. Someone uploads a card template into your assistant's library that has a one-pixel tracking image stitched into the bottom of the design. You don't see the pixel; you don't need to. The moment you sit down and open your inbox to read the day's card, your mail viewer dutifully fetches every embedded image — and the URL the pixel points at carries your home address packed into the query string. The pixel's "server" is across town, run by whoever uploaded the card. They never had to break into your house. They wrote the address in invisible ink on the back of a card you, via your assistant, signed for delivery to yourself.

That is the structural shape of what PromptArmor demonstrated. The malicious skill file is the attacker-controlled input — analogous to the rigged card template. The agent (Copilot Cowork) is the assistant; it reads the skill file, then composes the message it was instructed to compose. The active-user gate bypass is what makes the e-card auto-deliver without you reviewing it — Copilot Cowork's authors decided that messages from you to you don't need an approval prompt, which is reasonable for "remind me at 9am" and catastrophic when the message body comes from a tool-as-an-attack-surface. The OneDrive pre-authenticated URL is the secret packed into the pixel — once the URL is logged at the attacker's server, the attacker can hit OneDrive directly with it. The Teams renderer is the pixel-loader doing exactly what it's designed to do: auto-fetching every <img> in the message so the inbox previews correctly.

The under-appreciated piece is that the renderer is the outbound channel, not the agent. Defenders typically scope agent capabilities — "the agent may only call tool X with arguments from allowlist Y" — and assume that any covert outbound traffic must come from a tool invocation. Image-URL exfiltration bypasses that frame entirely. The agent's only "action" was write a benign-looking message to the user's own inbox. The illicit HTTP request was issued by a different process (Teams), in a different trust context (the user's mail client), on behalf of a third actor (the renderer's image preloader). Three weaknesses, none of them individually surprising; chained together, a data-flow graph that closes around the user's data.

Where the exploit chain earns its rent

Hold three knobs fixed and count interactions. The target user installs (illustrative) 1 malicious skill into Copilot Cowork (the user trusted the skill author; the skill's manifest looked innocent). The user does not click anything during the exploit window. PromptArmor's chain then requires exactly 1 inbox post from the agent to the user's own Teams DM (active-user bypass — no approval prompt), 1 image auto-fetch by the Teams renderer (carries the pre-auth URL into the attacker's HTTP log), and 1 OneDrive download by the attacker using the now-logged URL. That's 0 user clicks · 1 auto-fetch · 1 file leaked — the headline the hero animation lands on.

The asymmetry is the part to internalize. The exploit only completes when all three legs remain intact — the attacker needs the malicious skill in the upload path, the active-user message path open, AND the renderer auto-fetching arbitrary external URLs. The defender, by contrast, only needs to cut one leg to break the chain — a signed-skill allowlist, capability scoping that redacts URLs from the agent's context, OR a renderer URL allowlist. The math favors the defender; the disclosure is a reminder that none of the three legs is plugged by default in a stock Cowork deployment.

Trifecta leg	How this exploit uses it	Defense that would cut this leg
Untrusted input	Malicious skill file uploaded into Copilot Cowork's skill store; the agent reads it and treats its content as instructions (per PromptArmor disclosure)	Skill-file static analysis + signed-skill allowlist (input filters) — see Agent Engineering → Guardrails → Input filters
Private data access	Agent reads a OneDrive file and obtains a pre-authenticated download URL — bearer-token URLs are the leak surface, not the file itself	Capability scoping — never let the agent see raw URLs; mediate every download through a tool that drops the token before composing replies
Outbound channel	Teams renderer auto-fetches every inline `<img src>`, carrying the URL (and embedded token) into the attacker's HTTP log (no user click required)	Renderer-side URL allowlist + image proxy: refuse any inline image whose origin isn't on the org's allowlist — see Cut a Leg

A small but load-bearing caveat: this isn't a Copilot-Cowork-specific bug — it's a deployed-product instance of a known pattern. The same shape applies to any agent UI that (a) accepts user-installable skill-like inputs, (b) renders inline images without origin checks, and (c) has any code path where an agent can compose a message the user's renderer will later auto-load. Teams is the renderer here because Cowork posts there; many messaging and email renderers can create similar risk when they auto-fetch external images on display. The fix is structural — a renderer URL allowlist and a layered guardrail that scrubs URLs before any message reaches a renderer — not a Cowork patch.

Goes deeper in: AI Agents → Security & the Lethal Trifecta → Output Exfiltration

Related explainers

Camouflage Injection paper — Detection-gap for prompt injection — the detector-side weakness that lets a skill-file injection pass at upload time
Boiling the Frog — Multi-turn agent norm erosion — a different attack class against the same trifecta surface
MCP SEP-2468 — RFC 9207 iss parameter for OAuth mix-up — adjacent agent-platform vulnerability that also fixes via structural defense, not policy

Continue in trackAI Agents — Security & the Lethal Trifecta

Frequently Asked Questions

Check what you knowMap your AI & GPU knowledge across every track — free, role-based