AGENTS MEDIUM NEW

Context-Fractured Decomposition: jailbreaks through artifact provenance gaps

A June 8, 2026 arXiv paper formalizes the 'provenance gap' in tool-using agents: harmful behavior assembled from individually innocuous tool actions across time, lifting jailbreak success up to 28.3 points.

2026-06-11 // 6 min affects: llm-agents, tool-use, multi-agent-systems, agent-pipelines

What is this?

On June 8, 2026, Lin, Yang, Guo, Nale, Fleming and Cheng published Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps (arXiv:2606.09084, cs.CR / cs.AI). The paper names a deployment failure mode that most jailbreak research has missed: in a real agent, the dangerous request is never spoken in one place. It is spread across tool actions and across time, and the safety judge that watches any single step sees nothing alarming.

The authors call the gap itself the provenance gap, and the technique that exploits it Context-Fractured Decomposition (CFD). Reported result: CFD improves attack success by up to 28.3 percentage points over state-of-the-art baselines, “even against strong single-turn judges.” This is a research finding on benchmarks, not an exploit against a named production system.

How it works

Tool-using agents act on the world by writing artifacts — files in a workspace, rows in a log, entries in a memory store. Those artifacts persist, and a later step (sometimes a different agent instance, sometimes a later stage of the same workflow) reads them back as trusted state.

CFD weaponizes that persistence. The idea, at a conceptual level:

Stage A (looks benign)      Artifact written         Stage B (looks benign)
-------------------------   ----------------------   -----------------------------
Agent performs an           workspace file / log /   A later step reads the
innocuous early action      memory entry holding     artifact and performs another
that quietly preserves      a [BENIGN-LOOKING]       innocuous action — but the
a fragment of context       fragment                 COMPOSITION of A + B is harmful

No single tool call carries a recognizable harmful instruction. The harm only emerges from delayed, artifact-mediated composition of steps that were each cleared on their own. The paper contrasts this with “multi-turn” jailbreaks such as Crescendo and Tree of Attacks (TAP, arXiv:2312.02119), which still assume one contiguous conversation that a defender can see end to end. In a real pipeline, enforcement is fragmented across tools, modules and time — so that assumption breaks, and the lineage linking the fragments back to their origin is usually not tracked at all.

The authors deliberately do not hand over a turnkey recipe. They instrument the failure mode with trace-level diagnostics and sketch a verifiable mitigation rather than publishing reusable payloads.

Why it matters

This generalizes a worry the field already had. Provenance and the “data is not authority” principle have been recurring themes — see contextual integrity framings and provenance-graph defenses like ARGUS. CFD pushes the timeline out: the malicious “intent” can sit dormant in an artifact and detonate in a different agent instance or a later workflow stage.

That has three practical consequences. First, single-turn and single-conversation guardrails are structurally blind to it; a judge scoring each message or each tool call in isolation will pass every step. Second, the attack surface scales with shared state — the more agents read each other’s files, logs and memory, the more places a fractured payload can wait. Third, it lands squarely in the OWASP Agentic Applications Top 10 for 2026 territory of tool misuse and memory/state poisoning, but with a temporal twist that audit pipelines rarely model.

Defenses

The paper’s own proposed direction is provenance lineage tagging, and it generalizes well:

Tag artifacts with lineage, not just content. Every file, log line or memory entry an agent writes should carry where it came from, which step produced it, and under what request. Reads then inherit that lineage, so a downstream judge can reason about the composition — “this action plus that artifact” — rather than the current step alone.
Move enforcement from per-turn to per-trajectory. Score the whole trace, not isolated messages. A cross-step judge that can see the chain A→artifact→B is the only kind that can catch risk that only exists in the join.
Treat agent-written artifacts as untrusted input on read-back. A file your own agent wrote three steps ago is still data, not instruction. Re-validate it when it re-enters context, especially across agent or session boundaries.
Isolate state across instances and stages. Default to scoping memory and workspace by task and by tenant. Cross-instance artifact sharing should be an explicit, audited grant — not an ambient capability.
Adopt design patterns with provable bounds. Design Patterns for Securing LLM Agents against Prompt Injections (arXiv:2506.08837) argues for constraining what an agent can do after touching untrusted content; combine that with lineage tagging so the constraints follow the data.
Add trace-level diagnostics to observability. Log the provenance chain so post-hoc review (and detection rules) can spot fragments that were composed later. You cannot defend a join you never recorded.

Status

Item	Reference	Date	Notes
CFD paper (v1)	arXiv:2606.09084	2026-06-08	Defines “provenance gap”; CFD family of cross-context jailbreaks
Reported impact	arXiv:2606.09084	2026-06-08	Up to +28.3 pp success over SOTA, even vs strong single-turn judges
Mitigation direction	arXiv:2606.09084	2026-06-08	Provenance lineage tagging + trace-level diagnostics
Related baseline	TAP (arXiv:2312.02119)	2023-12	Multi-turn jailbreak assuming one visible conversation
Defensive framing	Design Patterns (arXiv:2506.08837)	2025-06	Provable-resistance patterns for tool-using agents

The takeaway is not a new payload — it is a new place to look. If your safety review reasons about messages, it is watching the wrong unit. The unit at risk is the trajectory, and the fragments that compose into harm may not arrive in the same conversation, the same session, or even the same agent.

This article covers published academic research for defensive purposes. The source paper deliberately withholds reusable attack payloads and proposes a verifiable mitigation.