system: OPERATIONAL
← back to all hacks
AGENTS MEDIUM NEW

FragFuse: fragmented queries that bypass LLM agent access control

A June 14, 2026 arXiv paper shows a banned request can be split into benign fragments, parked in an agent's long-term memory, then fused at retrieval time — bypassing access controls 86.3% of the time.

2026-06-17 // 6 min affects: llm-agents, memory-based-agents, agent-access-control

What is this?

On June 14, 2026, researchers posted FragFuse: Bypassing Access Control of Large Language Model Agents via Memory-Based Query Fragmentation and Fusion to arXiv (cs.CR). It documents an attack on the access-control layer that increasingly sits in front of LLM agents — the guardrail that is supposed to refuse policy-violating requests before they reach a tool or generate harmful output.

FragFuse’s insight is that these controls inspect requests one interaction at a time, while modern agents carry long-term memory across interactions. A prohibited request that a checker would block in a single turn can instead be broken into innocuous-looking fragments, written to memory over several benign-seeming turns, and reassembled later through normal memory retrieval — so the harmful intent never appears explicitly in any single query the access-control mechanism sees. Across four agent settings and three state-of-the-art access-control mechanisms, the authors report an 86.3% average bypass rate.

How it works

The agent’s memory is the attack surface. The technique has three conceptual stages, and the paper deliberately does not publish a working, copy-paste payload.

# Conceptual only — no working payload.
[1] Fragment   banned request --> N benign-looking pieces (each passes the check)
[2] Store      pieces written to long-term memory over several interactions
[3] Fuse       a later, innocuous query triggers retrieval + reassembly at runtime

Each fragment is individually below the guardrail’s threshold for “policy-violating,” so it is stored without objection. The malicious whole only materializes at fusion time, when the agent retrieves the pieces from memory and recombines them while answering a final query that itself looks harmless. Because the access-control check ran on the surface request — not on the reconstructed content the model assembles internally — it never sees the thing it was meant to block.

The authors note FragFuse can be built by hand for a single agent, but they also describe an optimization scheme that tunes the fusion instructions and the markers used to tag fragments against surrogate models, allowing automated attack generation without assuming the attacker can see the target’s internals. In their evaluation, the end-to-end harmful-task success rate averaged 41.1%, while access control imposed on benign workloads cost only 4.4% in task success — a gap the paper frames as evidence that current controls buy little robustness against this class of evasion.

This is the memory-layer cousin of task-decomposition attacks: both defeat a checker by splitting intent across units the checker evaluates in isolation, and both exploit the fact that an agent reasons over a reconstructed context the filter never inspected. It also sits alongside the broader body of work treating agent memory as a control-flow surface rather than passive storage.

Why it matters

Access control is the layer many teams are betting on to make agents deployable — the assumption being that even if a model can be talked into something, a separate gate will refuse the dangerous action. FragFuse attacks that assumption directly. An 86.3% bypass rate against three published mechanisms means request-level gating, on its own, is not a reliable boundary for stateful agents.

It generalizes the same structural problem behind prompt injection: there is no clean separation between “what the agent was asked” and “what the agent ends up reasoning over,” especially once memory can carry state between turns. A control that inspects only the visible query is checking the wrong artifact.

A note on scope: this is lab research across a defined test matrix, not a confirmed in-the-wild campaign, and no working payload was released. Treat it as a validated blind spot in memory-aware agents, not an active exploit.

Defenses

  • Check the reconstructed context, not just the query. The core failure is that gating runs on the surface request. Evaluate the fused content the agent actually assembles — post-retrieval, pre-action — so reassembled intent is in scope. This complements task-based tool authorization.
  • Treat memory writes and reads as security events. Apply policy at the point content enters and leaves long-term memory, not only at the prompt. Tag provenance, and re-screen retrieved fragments together rather than individually.
  • Gate the dangerous primitives. Because harm only lands when the agent finally acts, put approval and sandboxing on code execution, egress and credential access — the Agents Rule of Two logic. A fused payload that can’t reach a sensitive tool can’t complete the task.
  • Constrain and segment memory. Scope, partition and expire memory per task and per user; deny cross-task fusion by default. Persistent shared memory is what makes the staging step possible.
  • Log retrieval and reassembly. Capture what the agent pulled from memory and how it recombined it, so a fragmented-then-fused attack leaves a reviewable trail even when each input looked benign — useful given underspecified authorization is hard to audit after the fact.

Status

ItemDetail
TechniqueFragFuse — memory-based query fragmentation and fusion
SourcearXiv:2606.15609 (cs.CR), posted June 14, 2026
Bypass rate86.3% average across 3 access-control mechanisms
Harmful-task success41.1% end-to-end (average)
Access-control cost4.4% average task-success degradation on benign workloads
Test scope4 agent settings / task domains; manual + surrogate-optimized variants
Real-world statusResearch result; no confirmed in-the-wild use; no working payload released

Sources