system: OPERATIONAL
← back to all hacks
RESEARCH MEDIUM NEW

Toward Secure LLM Agents: a 247-paper SoK that reframes agent security as a systems problem

A June 9, 2026 arXiv survey of 247 papers maps LLM-agent security onto the agentic loop and finds defenses that work in isolation but barely compose — and benchmarks that miss long-horizon, stateful risk.

2026-06-18 // 6 min affects: llm-agents, tool-use, multi-agent-systems, coding-agents, web-agents

What is this?

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation is a systematization-of-knowledge (SoK) survey posted to arXiv on June 9, 2026 (arXiv:2606.10749) by Yuchen Ling, Shengcheng Yu, Zhenyu Chen and Chunrong Fang (Nanjing University and the Technical University of Munich), prepared for ACM TOSEM. It synthesizes 247 papers published between January 2023 and April 27, 2026 into a single, auditable picture of how the field actually looks.

Its central argument is one this site keeps returning to: once a language model is wired into a loop that plans, calls tools, keeps memory and acts on the outside world, security stops being about unsafe text and becomes a software-and-systems problem — trust boundaries, delegated authority, and persistent state. The survey is valuable less for any single attack than for the map it draws, and for the gaps that map makes visible.

How it works

The authors built the corpus through an auditable hybrid pipeline — database retrieval across six sources, a bounded LLM-assisted expansion step (used to widen recall, never as an inclusion oracle), and citation snowballing — then hand-coded every paper. A PRISMA-style flow took 275 audited records down to a normalized 247. Each paper was tagged against the stages of the agentic loop: input, planning, decision, tool execution, output, memory/state, monitoring, and multi-agent coordination.

That lifecycle lens is the contribution. Instead of cataloguing attacks in isolation, the survey traces how untrusted information becomes a control decision, how that decision meets delegated authority, and how state persistence changes the system’s security properties over time. Four research questions structure the synthesis: how agent security should be modeled (RQ1), which threat surfaces dominate (RQ2), what defenses exist and at what cost (RQ3), and how claims are evaluated (RQ4).

The corpus itself tells a story. It grows from 3 papers in 2023 to 42 in 2024 and 121 in 2025, with 81 more collected by April 27, 2026 — already a third of the total. And 68% of the corpus is arXiv preprints, with only a handful at NDSS, CCS or ICSE. The field is expanding fast but is still pre-standardized: terminology, threat models and evaluation protocols have not settled.

Why it matters

Three findings are worth carrying into design reviews.

First, the empirical center of gravity is still prompt injection and tool-mediated control-flow hijacking — the most studied, most benchmarked surfaces. But the survey flags persistent state corruption (poisoned memory and long-lived context) and multi-agent propagation as the fast-rising concerns that realistic deployments face and the literature under-synthesizes.

Second, defenses are weakly compositional. Individually, guardrails, privilege control, isolation and provenance tracking each work. Stacked together they do not cleanly add up: they target different assets, assume different trust models, and the survey finds no convergent, composable security stack you can simply assemble. A clean result on one defense says little about the whole.

Third, benchmarks measure the wrong window. Most still report immediate attack success in bounded, single-turn environments, leaving long-horizon behavior, stateful memory/coordination risk, and privilege-sensitive actions under-evaluated — and rarely measure safety, utility, latency and cost jointly. A defense that looks strong in a benchmark can still be brittle in a stateful deployment.

Defenses

The survey’s own prescription is architectural, and it maps cleanly to four engineering pillars you can hold a design against.

Make trust boundaries explicit. Treat tool outputs, retrieved documents, memory entries and inter-agent messages as untrusted data, not instructions. The model cannot reliably separate the two on its own, so the boundary has to live in the system, echoing the instruction-hierarchy and spotlighting lines of work.

Apply principled privilege control. Scope every tool call to least authority, deny-by-default, and bind capabilities to the task rather than the session. Control-flow hijacking only escalates to real harm when the hijacked step still holds broad privilege.

Manage state with provenance. Persistent memory and long context are now attack surfaces. Track where each stored item came from, gate writes, and treat a contaminated memory entry as capable of steering future decisions, not just the current turn.

Evaluate for deployment, not for the demo. Pick (or build) benchmarks that exercise long horizons, stateful memory and coordination, and that report utility and cost alongside attack-success rate. Because defenses don’t compose for free, test the stack you ship, end to end — not each control in isolation. The OWASP agentic-risk taxonomy is a useful cross-check on coverage.

Status

ItemReferenceDateNotes
Survey (SoK)arXiv:2606.10749v12026-06-09Lifecycle/systems framework, ACM TOSEM
Corpus size247 papers2023-01 → 2026-04-27275 audited → 251 retained → 247 normalized
Growth3 → 42 → 121 papers2023 / 2024 / 2025+81 by 2026-04-27 (~33%)
Venue mix68% arXiv preprintsField still pre-standardized
Dominant surfacesPrompt injection, control-flow hijackingMost studied / benchmarked
Emerging surfacesState corruption, multi-agent spreadUnder-synthesized
Companion siteLLMAgentSecuritySurvey2026Browsable corpus

The takeaway is not a new attack. It is a discipline: secure LLM agents need explicit trust boundaries, principled privilege control, provenance-aware state, and evaluation that matches how agents are actually deployed — and the survey is honest that the field does not yet have a stack that delivers all four together.

Sources