Securing RAG: four attack surfaces along the knowledge-access pipeline
A June 2026 survey reframes RAG security around external knowledge access, separating inherent LLM flaws from RAG-introduced risk across four surfaces and three trust boundaries.
What is this?
Retrieval-augmented generation (RAG) is now the default way to give an LLM access to private documents, databases, and up-to-date knowledge. It is also a security surface that most threat models treat poorly, because they fold RAG-specific risks into generic “LLM safety.” A survey published on arXiv on April 9, 2026 and revised on June 8, 2026 — Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions, by Yuming Xu and colleagues at the Hong Kong Polytechnic University and HKUST (Guangzhou) — proposes a sharper frame: secure RAG is fundamentally about the security of the external knowledge-access pipeline, not about the model’s parameters or the user prompt. That single reframing changes where you look for attacks and where you place controls.
How it works
The survey abstracts any RAG system into a six-stage workflow: external sources provide raw content; an ingestion pipeline parses and indexes it into a searchable knowledge substrate; retrieval and reranking select candidate evidence for a query; context assembly builds the model-visible prompt; the generator answers; and the system delivers the response with logging and remediation. Along that path it identifies three trust boundaries and four attack surfaces.
The first surface is pre-retrieval knowledge-substrate corruption — poisoning the corpus before any query runs. Because the planted content is later surfaced as legitimate evidence, it persists across queries, users, and sessions. The survey catalogs corpus and document poisoning, attacks on the ingestion toolchain (malicious content hidden inside common document formats), poisoning of graph-based and multimodal stores, and code-oriented poisoning that pushes attacker-controlled dependencies into generated code. We covered concrete instances in corpus poisoning that survives reranking and silent retrieval corpus poisoning.
The second surface is retrieval-time access manipulation: distorting, redirecting, or suppressing which documents get selected, often query-by-query and even in black-box settings where the attacker can only probe the retrieval interface. The third, and the survey’s “most important” boundary, is downstream retrieved-context exploitation — once retrieved evidence becomes model-visible context, untrusted external data can steer generation directly, the mechanism behind indirect prompt injection. The fourth is knowledge exfiltration and privacy attacks, where adversaries run the interface in reverse to infer or extract sensitive records from the substrate; see RAG membership inference.
Crucially, the authors define an operational boundary to keep the scope honest: a risk counts as RAG-introduced only when external knowledge is the main carrier of the threat, when knowledge access creates an entry point that prompt-only use does not, or when retrieval materially increases the threat’s persistence, transferability, or blast radius. Prompt-only jailbreaks and pure parametric memorization are explicitly out of scope.
Why it matters
The reframing matters because it explains why RAG failures are worse than transient prompt failures. A poisoned substrate turns a one-off, query-local event into a persistent compromise of shared state — reusable across queries, transferable across users, and harder to detect, attribute, and remove. The survey’s blunt conclusion is that current defenses “remain largely reactive and fragmented.” A parallel March 2026 review, Towards Secure RAG, reaches a similar verdict on threats, defenses, and benchmarks, and indirect prompt injection in the wild shows the downstream surface is exploited in real systems, not just labs. For teams shipping RAG assistants, the practical implication is that input filtering at the prompt is the wrong and last place to defend.
Defenses
The survey organizes remediation as controls distributed along the same pipeline, one layer per surface. Map your defenses to the boundary they actually protect:
- Knowledge-base integrity and provenance (pre-retrieval). Treat ingestion as a trust boundary. Validate and sanitize documents at parse time, track provenance per chunk so you can attribute and revoke poisoned content, and gate write access to the corpus. Persistence is the attacker’s advantage here, so retain the ability to remediate — re-index and purge — not just detect.
- Retrieval-time access hardening. Harden retrievers and rankers against relevance manipulation: monitor for anomalous ranking shifts, diversify or ensemble retrieval, and avoid trusting a single dense retriever that can be backdoored. A hybrid-retrieval defense raises the cost of single-payload poisoning.
- Post-retrieval context isolation (downstream). Assume retrieved text may contain instructions. Isolate evidence from commands, mark source authority by channel rather than by anything written inside the source — the control-signal impersonation point — and constrain what the generator may act on.
- Access control, privacy, and confidentiality (exfiltration). Apply per-document authorization so retrieval cannot return records the user may not see, and rate-limit or audit response patterns that probe the substrate for extraction.
The survey’s forward-looking recommendation is layered, boundary-aware protection across the whole knowledge-access lifecycle rather than a single guardrail. No individual control closes the surface; the point of the taxonomy is to make sure none of the four is left undefended.
Status
| Item | Detail |
|---|---|
| Source | Securing RAG: A Taxonomy of Attacks, Defenses, and Future Directions (arXiv:2604.08304) |
| Published | v1 April 9, 2026; revised June 8, 2026 |
| Affiliation | Hong Kong Polytechnic University; HKUST (Guangzhou) |
| Frame | Six-stage pipeline, three trust boundaries, four attack surfaces |
| Key claim | Secure RAG = security of the external knowledge-access pipeline |
| State of defenses | ”Largely reactive and fragmented”; layered boundary-aware defense recommended |
The durable takeaway: stop asking whether your LLM is “safe” and start asking which boundary of your knowledge-access pipeline an attacker would cross — because in RAG, the corpus is shared state, and shared state stays compromised until you remediate it.