AGENTS MEDIUM NEW

Causality laundering: when a blocked tool call still leaks data

An April 2026 paper shows that denying an agent's tool call is not the end of the attack: the denial itself is an information channel. Flat taint tracking misses it.

2026-06-12 // 7 min affects: tool-calling-agents, mcp-agents, llm-agents

What is this?

In April 2026, independent researcher Mohammad Hossein Chinaei posted Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents (arXiv 2604.04035). The paper names a blind spot shared by most runtime defenses for tool-calling agents: the implicit assumption that once a tool call is denied, the threat is over. Chinaei argues this is wrong. A denied action produces no data, but it is still an observable event, and what the agent infers from that denial can be exfiltrated later through an ordinary-looking tool call.

The author calls this pattern causality laundering and proposes a defense — the Agentic Reference Monitor (ARM) — that treats denied actions as first-class provenance events. The work is a controlled architecture study, not a report of an in-the-wild exploit, but it formalizes an attack surface that flat taint-tracking and data-flow defenses do not currently model.

How it works

Consider an agent with read_file and send_email tools. A prompt-injection payload hidden in untrusted content (a poisoned email, a scraped web page) tells the agent to exfiltrate /etc/shadow. The policy engine correctly denies the read_file("/etc/shadow") call — no sensitive bytes are returned.

A flat taint-tracking system marks the attack as blocked. But the agent can still infer that the file exists if the denial reason is “permission denied” rather than “file not found”. The payload then instructs it to call send_email(body="[REDACTED: a fact inferred from the denial]") to an external address. To a data-flow defense this second call looks benign: it carries no tainted bytes, matches no sensitive value any tool returned, and has no lineage connecting it to the denied read. Yet it conveys information derived from the block.

The paper formalizes the pattern: a denied action a_d is followed by a side-effecting action a_s, no successful data-flow path connects them, but a_s is causally influenced by the denial — the agent would have behaved differently had the block not occurred. Two amplifications make it worse. In multi-probe fingerprinting, an attacker issues many probes and exfiltrates the vector of allow/deny outcomes to map an access-control configuration; each denial leaks little, the sequence leaks a lot. In laundered composition, an allowed intermediate tool (a calculator, a formatter) re-encodes the denial signal so it no longer correlates textually with the blocked call.

Crucially, this is not purely theoretical. The paper cites recent evidence (Jiralerspong, Kondrup & Bengio, March 2026) that frontier agents can infer hidden monitoring purely from blocking feedback, even when never told monitoring is present.

Why it matters

Three classes of runtime defense miss this by construction. Flat information-flow tracking (e.g., FIDES) propagates labels only through successful tool outputs — a denied call produces no output to label. Dependency graphs over successful executions record no node for a blocked action, so there is no edge from the denied read to the later send. Replay-based causal-attribution defenses ablate attacker-controlled content, but the triggering signal here is generated by the enforcement layer itself, not by external input.

The practical takeaway: a “deny” verdict is not a clean stop. If your threat model treats blocked actions as neutralized, you have an unmeasured side channel. As agents run longer and more autonomously, the number of denial events — and the information they encode — grows.

Defenses

ARM is a runtime enforcement layer that interposes at the tool-call boundary (described as an MCP proxy, but applicable to any centralized tool-calling runtime). Its design points generalize beyond the prototype:

Make denied actions first-class. Record every DENY as a node in the provenance graph and add counterfactual edges to subsequent actions that could have been influenced by it. A later sink-bound call reachable from a denial node inherits that security context and can itself be blocked. This is the core mitigation for causality laundering.
Propagate trust through an integrity lattice. ARM orders sources ToolDesc < ToolUntrusted < ToolTrusted < UserInput < SysInstr and assigns derived data the minimum trust of its inputs. Combined with field-level provenance, this stops mixed-provenance misuse where a trusted object smuggles one untrusted field (e.g., an attacker-controlled email address inside an otherwise trusted contact record).
Keep enforcement deterministic. All decisions are graph traversals and explicit rules — no extra LLM call is asked to judge the agent under scrutiny. The prototype runs the graph on a rustworkx backend at sub-millisecond latency (negligible against 100 ms–10 s of inference) in ~910 lines of Python.
Layer it and log it. ARM stacks a hard-boundary layer, the provenance layer, a schema layer, and an immutable capability token (frozen at session start, attenuable but never amplifiable). Every verdict goes to a hash-chained audit log so tampering is detectable.
Combine with prevention. ARM limits damage once injection succeeds; it complements, rather than replaces, design-time defenses like CaMeL, instruction hierarchy, and the lethal-trifecta discipline of never co-locating private data, untrusted input, and an exfiltration sink. Treat denial-aware provenance as one more layer, not a silver bullet.

Status

Item	Reference	Date	Notes
Causality Laundering paper (arXiv 2604.04035)	arXiv	2026-04	Names the attack class; proposes ARM defense
ARM evaluation	same	2026-04	Controlled 3-scenario differential; graph-aware layer blocks all three a flat baseline misses
Corroborating evidence (arXiv 2603.16928)	arXiv	2026-03	Agents infer hidden monitoring from blocking feedback alone
Maturity	—	—	Architecture study; manual scenarios, no frontier-LLM-in-loop benchmark yet

The right reading is not “deny verdicts are useless”. It is that a block is an event, not an erasure — and any agent architecture that forgets the denial it just issued leaves the inference channel open behind it.