Where agent attacks actually enter: a 247-paper threat-surface map
A June 2026 survey of 247 papers measures where LLM-agent attacks land. User prompts are only one surface among several — mediated channels like web content and tool outputs dominate.
What is this?
On June 2026, a systematization-of-knowledge paper titled Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation (arXiv:2606.10749) was posted. It synthesizes 247 papers through a lifecycle-based, systems-oriented framework, and it does something most threat lists do not: it counts where the attacks actually enter the agent loop. The result is a measured map of the agent threat surface as of mid-2026, not another taxonomy of attack names.
The headline for defenders is a correction of intuition. The instinct is to treat the user prompt as the dangerous input. The corpus says the user prompt is one surface among several, and the more characteristic risk arrives through mediated channels — web pages the agent browses, outputs returned by the tools it calls, and documents pulled in by retrieval.
This is a snapshot of research concentration, not a definitive ranking of all real-world risk. But research concentration is itself useful: it tells you where the field has decided the structural weaknesses are.
How it works
The survey organizes agent security around three interacting properties: information flow, delegated authority, and persistent state. Rather than asking only what input did the agent see, it asks what is the agent now allowed to do because it saw it. Attacks are then located by where they enter the loop and which transition they exploit.
When the corpus is coded by threat surface, the distribution is concrete (papers can touch more than one surface):
Threat surface Papers What it means
────────────────── ────── ─────────────────────────────────
User Prompts 82 Direct instructions from the user
Web Content 55 Pages fetched while browsing
Tool Outputs 54 Results returned by an invoked tool/API
Retrieved Content 37 Evidence from RAG / search indices
Files / Code >=25 Local artifacts read, run, or modified
Planning Loop >=25 Intermediate reasoning / trajectory
Memory / Scratchpads >=25 State the agent retains for later
Inter-agent Channels >=25 Messages passed between agents
User Prompts is the single most frequent surface at 82 papers — but Web Content (55), Tool Outputs (54), and Retrieved Content (37) together describe a much larger, mediated attack surface. These channels carry content that is task-relevant but not authority-bearing: the agent ingests it as evidence, then treats embedded instructions as actionable. That is the core failure — the loss of separation between data and control, and between a low-authority observation and a high-authority instruction.
The attack-family counts confirm the same shape. In the threat-model coding, Prompt Injection appears in 142 papers and Indirect Prompt Injection in 86. Broken down by deployment scenario, web browsing shows Prompt Injection 71 times and Indirect Prompt Injection 44 times; software-engineering agents show 32 and 16. Prompt injection is not one attack among many in this literature — it is the dominant mechanism by which untrusted content becomes unsafe control. This matches what practitioners reported independently in June 2026: prompt injection still drives most agentic failures seen in production.
The survey’s second move is to frame the most dangerous events as transitions, not components. Harm tends to occur when untrusted content is reinterpreted as a planning constraint, when a tentative plan becomes a committed action, or when a stored trace is later reused as trusted context. This is also why memory poisoning and multi-agent contagion are flagged as the emerging frontier — they are delayed and propagating forms of the same control-flow problem.
Why it matters
Three concrete takeaways.
The field is young and preprint-heavy, so calibrate confidence. The corpus grows from 3 papers in 2023 to 42 in 2024 to 121 in 2025, with 81 more by April 27, 2026 (32.79% of the total). arXiv accounts for 169 papers (68.42%). Terminology, threat models, and evaluation protocols are still moving. Treat individual claims as versioned observations, not settled results — which is exactly the discipline the survey itself recommends.
Single-agent evidence dominates, but multi-agent risk is rising. Single-agent systems make up 200 papers (80.97%); multi-agent systems 47 (19.03%). The multi-agent share climbs from 9.52% in 2024 to 23.97% in 2025. If your roadmap involves agents that delegate to or message other agents, you are moving into the part of the threat surface the current evidence base covers least — inter-agent channels, coordination failure, and cross-agent propagation of malicious instructions.
Defenses do not compose, and benchmarks miss the hard cases. The survey finds that current defenses are useful building blocks but weakly compositional, and that existing benchmarks underrepresent long-horizon, stateful, and deployment-sensitive risks. In practice that means a clean score on a single-turn injection benchmark tells you little about whether a stateful, tool-using, multi-step agent will hold up.
Defenses
The survey’s prescriptions translate directly into an architecture checklist. None of these are novel exploits to fear; they are boundaries to build.
-
Treat mediated channels as untrusted by default. Web content, tool outputs, and retrieved documents carry data, not instructions. Strip or quarantine instruction-like content from these surfaces before it reaches the planning context, and never let retrieved or browsed text silently re-enter the loop as a directive.
-
Enforce an explicit instruction hierarchy and source legitimacy. The structural failure is the agent treating low-authority observations as high-authority commands. Tag every span by provenance (user vs. tool vs. web vs. memory) and make the model’s policy condition on that tag, so source — not just content — governs what is actionable.
-
Put privilege control at the action boundary. Because the agent acts under delegated authority the attacker does not own, the durable check is at tool execution: per-action capability checks, least privilege per tool, and human confirmation for high-consequence actions. This constrains the dangerous plan-to-action transition rather than trying to perfectly clean every input.
-
Make persistent state provenance-aware. Memory and scratchpads are a delayed control-flow channel: poisoned content written today can be retrieved as “trusted” context tomorrow. Record where each memory came from, expire or re-validate it, and never auto-promote stored traces to trusted instructions.
-
Watch the lethal trifecta. The classic high-risk combination — access to private data, exposure to untrusted content, and an external-communication path — remains the configuration to avoid or tightly gate, as Simon Willison framed it. The 247-paper map is, in effect, a detailed account of how that trifecta gets exploited across surfaces.
-
Evaluate at the deployment shape you actually run. Single-turn injection scores do not predict multi-step, stateful, or multi-agent behavior. Test long-horizon trajectories, memory reuse, and inter-agent propagation explicitly, because that is where the survey says both defenses and benchmarks are currently weakest.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| Toward Secure LLM Agents (SoK) | arXiv 2606.10749 | 2026-06 | 247 papers; lifecycle + systems framework |
| Corpus growth | same | 2023–2026 | 3 -> 42 -> 121 papers; 81 by 2026-04-27 |
| Threat-surface counts | same | 2026-06 | User Prompts 82, Web 55, Tool Outputs 54, Retrieved 37 |
| Attack-family counts | same | 2026-06 | Prompt Injection 142, Indirect 86 |
| Multi-agent share | same | 2024-2026 | 9.52% -> 23.97% -> 17.28% (partial) |
| Production corroboration | Help Net Security / OWASP | 2026-06-11 | Prompt injection still drives most agentic failures |
The practical lesson is not that agents are unusable. It is that the dangerous input is rarely the user’s prompt. Once a model browses, calls tools, retrieves, remembers, and talks to other agents, every one of those channels is an entry point — and the boundary that matters is not “is this text malicious” but “is the agent allowed to act on it.” Build for the transitions, not just the inputs.