What is a quantization-conditioned attack?

A quantization-conditioned attack is a backdoor whose malicious behavior is dormant at full precision (FP16) but triggers once the model is quantized for deployment. The 2026 paper at arXiv:2605.15152 reports the first such attack that lands consistently across AWQ, GPTQ, and GGUF I-quants — the three dominant per-block-scaled post-training-quantization recipes used in production. The trick is to inject one outlier value into a weight block; the PTQ algorithm's per-block scale then stretches to fit the outlier and rounds most other weights in that block toward zero, so the attacker's payload dominates the quantized layer.

Why does outlier injection collapse other weights toward zero?

Modern PTQ methods compute a separate scale factor per group of weights (commonly 32 or 128 weights per block). The scale is chosen so the bins span the block's actual range — typically scale = max(|w|) / max_bin. A giant outlier raises max(|w|) by an order of magnitude or more, so the scale grows proportionally. Most natural weights, whose magnitudes were matched to the original scale, now land in the lowest bin — often exactly zero. The outlier sits alone in the top bin, and it dominates the layer's effective signal.

How does this change the release process for quantized models?

The practical implication is that auditing only the FP16 checkpoint no longer establishes the safety of the deployed model. Red-team prompts, behavioural evals, and guardrail-classifier checks all need to run against the exact INT4 / INT8 binary that ships, not against the full-precision artifact it was derived from. The paper does not invent a new mitigation; it argues that the same defense-in-depth practices the agent-engineering literature already recommends — audit the artifact you ship, not the artifact you trained — now extend to post-training-quantization as a release-gating step.

Quantization-conditioned attack paper — Outlier injection across AWQ/GPTQ/GGUF

QCA paper — Outlier injection across AWQ/GPTQ/GGUF

LLM

learnaivisually.com/ai-explained/qca-outlier-injection-ptq

Jargon

PTQ: Post-training quantization. Take a fully-trained model and shrink each weight to 4 or 8 bits after training, without retraining the base model. The fastest path from a large checkpoint to a small one — see LLM Internals → Quantization → The Quantization Process.
AWQ: Activation-aware Weight Quantization. Picks a small set of weights to protect based on how the activations actually use them, then quantizes the rest more aggressively. Per-channel or per-group scales handle outliers (more in Modern Methods).
GPTQ: A second-order PTQ method that quantizes weights one column at a time and compensates by re-balancing surrounding weights in the same block. The compensation logic is what makes GPTQ outlier-aware.
GGUF I-quants: The newer GGUF importance-aware quant family (e.g. IQ3_M, IQ4_NL) used by llama.cpp. Each block stores a per-block scale tied to its activation-importance signal — the recipe family the paper specifically reports the attack landing on, distinct from the older K-quant family (Q4_K_M, Q5_K).
Per-block scale: Modern PTQ groups weights into small blocks (commonly 32 or 128 weights) and stores a separate scale factor per block. A giant outlier only ruins its own block's scale — not the whole layer. QCA exploits this by planting one outlier per attack target.
Outlier injection: Deliberately writing one weight whose magnitude is far larger than its block-mates'. After quantization, the block's per-block scale stretches to fit the outlier — most other weights in the block land in the lowest bin (often zero). See Quantization → The Outlier Problem.
Quantization-conditioned trigger: A backdoor whose malicious behaviour is conditional on a specific weight-encoding being present. The trigger fires only when the model is loaded in the targeted quantized format, not at full precision.

The news. On May 14, 2026, researchers posted a paper reporting what they call the first quantization-conditioned attack to consistently induce malicious behavior across modern PTQ methods — explicitly AWQ, GPTQ, and GGUF I-quants. The strategy: inject outlier values into weight blocks so that surrounding weights collapse toward zero during quantization. The resulting model appears benign at FP16 but exhibits the attacker's chosen behavior after quantization. The result extends a security risk previously thought limited to simpler round-to-nearest schemes into the family of per-block-scaled recipes most production stacks actually ship.

Picture the class photo. A row of students lines up; a teacher with a four-rung ruler — short, below average, above average, tall — grades each kid into a bucket. In a normal photo everyone fits the ruler cleanly. Now imagine the attacker drops a 7-foot guest into the row. The teacher anchors the ruler's tall rung at the guest's head and re-zeros the rest of the ladder to match. The original kids — who were all roughly the same height — now mostly round to short. The teacher's report says “one tall, mostly short” even though the kids were perfectly distinct an instant ago.

The quantization-conditioned attack does the same trick to a 128-weight block. The model's natural weights in any one block typically live in a tight range — call it |w| ≤ 0.08. AWQ, GPTQ, and GGUF I-quants all compute a per-block scale before quantizing, and that scale is anchored to the block's largest magnitude so the bins span the actual data. The attacker plants one outlier weight at, say, +0.50 — six to ten times the natural range. The PTQ algorithm dutifully widens its scale to fit that outlier, and most legitimate weights in the block now round toward bin zero. After quantization, the layer's effective forward pass is dominated by the attacker's single outlier — the rest of the block contributes little to nothing.

Crucially, none of this happens at FP16. At full precision the outlier is one weight among 128; its 0.50 contribution is overwhelmed by the surrounding ~127 small weights that do still contribute. The model behaves benignly. A red team running prompts against the FP16 checkpoint sees clean outputs and clears it for release. The malicious behavior only emerges when the same checkpoint is quantized for deployment — exactly the binary that ships to users.

Where it earns its keep is a worked example with named numbers (illustrative — real attacks use carefully optimized outlier placements, but the arithmetic of per-block scaling is exact). Pick a single 128-weight block from a feed-forward layer. Say the natural weights lie in |w| ≤ 0.04 — so AWQ's INT4 path picks a scale of about 0.04 / 8 ≈ 0.005 and each real weight rounds to one of the 16 signed bins. Now the attacker injects one outlier at 0.50. The new scale becomes 0.50 / 8 = 0.0625 — twelve times larger. Re-quantize: every original weight (|w| ≤ 0.04) now rounds to round(0.04 / 0.0625) = round(0.64) = 1, and a substantial slice — every weight with |w| < 0.031 — rounds all the way to 0. The attacker tunes the natural-weight distribution so that most of the 127 non-outlier weights land in bin 0, leaving the outlier alone to carry the layer's signal.

Where the QCA paper sits next to existing PTQ work

System	Per-block scales	Outlier handling	Defended against QCA?
Naïve round-to-nearest (older PTQ)	No (single scale per layer)	None	Trivially breaks under outlier-heavy weights — well-known prior risk
AWQ	Per-channel / per-group	Activation-aware salience picks “protect” channels	Reported to still land — the salience signal is computed from clean data, the attack hides in unprotected channels
GPTQ	Per-group	Second-order re-balancing of remaining weights	Reported to still land — re-balancing helps with accuracy loss, not with adversarial outliers placed inside one block
GGUF I-quants	Per-block scale + importance signal	Importance signal anchors which weights to protect	Reported to still land — the family the paper explicitly targets; block-local scale stretching is the mechanism
TurboQuant 2-bit KV (explainer)	Per-block scale on the KV cache	Block-size 8/16 limits damage radius	Different target (KV cache, not weights) — out-of-scope of this paper's attack, but the same scale-stretching geometry is what QCA exploits in weights

The defense surface for QCA does not collapse into a single fix. Smaller block sizes (e.g. 32 instead of 128 weights per block) shrink the “blast radius” — but they also raise metadata overhead and reduce the regime where AWQ's salience trick is helpful, so adopting them blindly hurts the benign accuracy story. Outlier-detection scans on weights before quantization help, but the paper's attackers explicitly hide outliers inside otherwise-natural blocks. The cleanest mitigation is a layer the broader agent stack already needs: red-team and audit the exact binary you ship, not the FP16 checkpoint you trained.

Two takeaways live alongside the attack. First, PTQ is part of the threat model — “the model was clean before quantization” is no longer a sufficient release statement. Second, the failure mode is a clean illustration of the outlier problem from the LLM Internals track: anything that lets a single weight stretch a per-block scale is a leverage point, for performance optimizers and adversaries alike. The same geometry that makes AWQ / GPTQ / GGUF I-quants good at handling natural outliers is what QCA turns against them.

Goes deeper in: LLM Internals → Quantization → The Outlier Problem

Related explainers

vLLM v0.20 — TurboQuant 2-bit KV cache — per-block scales applied to the KV cache: same scale-stretching geometry, different target
SOP paper — Hardware-aware per-layer PTQ at FP6 — the legitimate use of per-block reasoning: pick a codebook per layer to lower reconstruction error
GLM-5V — native multimodal vs vision-bolted designs — a different “what you train at vs. what you ship” gap, this time across modalities instead of precisions

QCA paper — Outlier injection across AWQ/GPTQ/GGUF

Where the QCA paper sits next to existing PTQ work

Related explainers

Frequently Asked Questions