The news. On July 1, 2026, a paper titled "LOCOS: Locating Non-Literal Retrieval Heads by Logit Contribution" (arXiv 2607.01002) argued that the standard way of finding a model's retrieval heads has a blind spot. Existing detectors reward a head when the token it attends to matches the token the model generates — a literal-copy test. LOCOS instead scores each head by how much its output pushes the answer token's logit up, catching heads that retrieve by meaning without copying. Ablating LOCOS's top-ranked heads on Qwen3-8B drops long-context recall to near zero, evidence they are the ones doing the work. Read the paper →

Picture a vast library and a question whose answer sits on exactly one shelf, buried among thousands. Send in a crowd of clerks — the model's attention heads — and later ask who actually found it. The old rule credits whichever clerk is still standing at the exact answer shelf: easy to check, because you just match their location against the shelf that held the fact. This is the literal-copy test, and for a head that copies the needle token verbatim it works fine.

But the clerk who read the shelf, walked off, and wrote the answer in their own words gets no credit — they aren't standing where the fact lives anymore. That is a non-literal retrieval head: it answers from the meaning of a passage, so its attention no longer points at the token you would check against. As the paper puts it, the old detectors "reward heads whose attended token matches the generated token, a literal-copy criterion that captures where a head reads but not what it writes." A whole class of retrieval heads is invisible to a location check.

LOCOS changes what gets measured: not where a clerk stands, but how much their note reaches the answer slip. Concretely, it takes each head's OV-circuit output — what the head actually writes into the residual stream — and projects it onto the unembedding direction of the answer token, the same direction the LM head uses to turn a hidden state into that token's logit. A head that shoves the answer token's score upward gets a high logit contribution, whether or not it ever pointed at the needle. It contrasts the needle position against off-needle positions in a single forward pass, then ranks every head by write-aware contribution.

← heads (parallel / width) →Layer 3H1H2H3H4Layer 2H1H2H3H4Layer 1H1H2H3H4layers (sequential / depth)

Now put numbers on it. Take Qwen3-8B, score all its heads by logit contribution in one forward pass, and ablate the top 50 — and long-context answering doesn't just dip, it flatlines: ROUGE-L falls from 0.401 to 0.000. The collapse holds across task types on the same model: multi-hop MuSiQue drops from 0.55 to 0.08, and BABILong from 0.62 to 0.20. Knock out 50 write-aware heads and a model that could read a novel-length prompt suddenly can't recall the one line that mattered — which is the causal proof that those heads, not the ones a location check would have flagged, were doing the retrieval. The paper reports the same pattern across Qwen3, Gemma-3, and OLMo-3.1.

Way to find retrieval headsWhat it measuresCatches literal-copy heads?Catches meaning-based heads?
Attention-overlap (retrieval score)does the head's attended token equal the generated token?yesno — misses them
LOCOS (arXiv 2607.01002)how much the head's OV output moves the answer token's logityesyes

What makes this clean is that the measuring stick was already inside the model. You do not train anything extra: the answer token's own unembedding direction tells you which write to look for, and each head's OV output tells you who is doing that writing. LOCOS just takes the dot product and ranks. The lesson generalizes past this one paper — when you want to know which part of a network is responsible for an output, follow what it writes toward that output, not where it happens to be looking.

Goes deeper in: LLM Internals → Attention → Multi-Head Attention

Related explainers

Continue in trackLLM Internals — Attention: what each head does

Frequently Asked Questions

Check what you knowMap your AI & GPU knowledge across every track — free, role-based