LLM privacy isn't one risk: what an ablation study tells you to fix first
A May 2026 study measures membership inference, attribute inference, data extraction and backdoors under one threat model. The finding: leakage is driven by your design choices — scale, data duplication, RAG config — not by the attack alone.
In brief “LLM privacy” is usually discussed as a single worry — the model memorised something. A new study, Makhlouf, On the Privacy of LLMs: An Ablation Study (arXiv 2605.02255, 4 May 2026), puts four distinct privacy attacks under one threat model and measures how each responds to the same system factors: model architecture, scale, training-data properties, and retrieval (RAG) configuration. The takeaway for builders is architectural: the size of your privacy problem is set largely by deployment choices you control, and the four attack families do not behave the same way — so a single mitigation is not enough.
What is this?
Privacy attacks on language models are normally studied one at a time, each with its own threat model and metrics. That fragmentation makes it hard to reason about a real deployment, where the same model faces all of them at once. The May 2026 paper reproduces a representative set of four attacks under a unified notation and access model, then runs a structured ablation to see which deployment factors actually move the needle. The four families it covers map directly onto OWASP’s LLM02: Sensitive Information Disclosure:
- Membership Inference (MIA) — was this exact record in the training set?
- Attribute Inference (AIA) — infer a sensitive attribute about a person from the model.
- Data Extraction (DEA) — make the model regurgitate verbatim training text.
- Backdoor Attacks (BA) — a trigger planted during fine-tuning forces attacker-chosen behaviour.
How it works
The study does not publish new attack payloads; it measures known ones under controlled conditions. The reported pattern is what matters:
Attack Signal strength Driven hardest by
----------- -------------------- -------------------------------
MIA strong, reliable (mask-based variants especially)
Backdoor consistently high trigger presence (by design)
AIA weaker / lower acc. but targets sensitive PII
DEA weaker / lower acc. model scale, data duplication
Two cross-cutting drivers recur. Memorisation scales with capacity, training duration and data duplication — bigger models trained longer on duplicated data leak more, a result the paper anchors in prior work on deduplication. And inference-time configuration matters: how a RAG system is set up changes the exposed surface, because whatever the retriever pulls in, the model can surface. The headline conclusion is that privacy risk is context-dependent and driven by design choices, not an intrinsic constant of “the model.”
Why it matters
If you treat privacy as a single checkbox, you will defend the wrong thing. Membership inference and backdoors produce strong, dependable signals for an attacker, while attribute inference and verbatim extraction are noisier — yet AIA and DEA are precisely the ones that expose real personal data when they land. The corollary is that a clean result on one attack tells you nothing about the others. It also reframes model selection as a privacy decision: choosing a larger model, training on duplicated corpora, or wiring an under-scoped retrieval index are each privacy-relevant choices, not just quality or latency trade-offs. This is the privacy analogue of a lesson the field keeps relearning about detection — measure the whole surface, because adversaries pick whichever attack your design left cheapest.
Defenses
Treat leakage as a function of design, and harden the design.
- Deduplicate training and fine-tuning data. Duplication is one of the clearest amplifiers of memorisation; deduplication is one of the few mitigations with consistent empirical support.
- Apply differential privacy where the data is sensitive. DP fine-tuning (DP-SGD) and DP auditing bound and measure what a model can memorise; canary-based auditing (see arXiv 2512.13352 on membership inference for targeted extraction) lets you quantify risk before release.
- Pick the smallest model that does the job. Scale buys capability and memorisation together; an oversized model is a larger privacy liability.
- Govern the RAG index like a database. Keep raw PII out of the retrieval corpus, enforce per-user access control on retrieval, and remember the model will surface whatever it is allowed to fetch.
- Defend the supply chain against backdoors. Backdoor success is high because triggers are reliable; vet fine-tuning datasets and third-party checkpoints, and test for trigger-conditioned behaviour.
- Evaluate holistically. Run MIA, AIA, DEA and BA probes together at a fixed setup, not in isolation — the paper’s central methodological point.
Status
| Item | Reference | Date | Note |
|---|---|---|---|
| Unified ablation of MIA/AIA/DEA/BA | arXiv 2605.02255 | 4 May 2026 | MIA & backdoors strong; AIA/DEA weaker but target PII |
| MIA in targeted data extraction | arXiv 2512.13352 | Dec 2025 | Membership signals used to drive extraction |
| Sensitive Information Disclosure = LLM02 | OWASP LLM Top 10 | 2025–2026 | Maps these attacks to the application risk list |
The framing to keep: there is no single “privacy setting” for an LLM. The numbers move with architecture, scale, data hygiene and retrieval design — so privacy is something you engineer across the lifecycle, and verify with the whole family of attacks rather than one of them.