system: OPERATIONAL
← back to all hacks
DEFENSE MEDIUM NEW

Agent privacy is a trajectory problem: OCELOT budgets inference leakage at runtime

An arXiv paper dated June 10, 2026 reframes LLM-agent privacy as posterior-risk control: not filtering each output, but budgeting how much an adversary's belief about a secret may improve across a whole trajectory.

2026-06-16 // 6 min affects: llm-agents, tool-calling-agents, rag-agents, multi-agent-systems

What is this?

On June 10, 2026, a paper titled OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM Agents (arXiv:2606.12341, cs.CR) makes a precise argument about why per-output privacy filters keep failing on agents that read your files, call tools, and transact with external services. The claim: privacy for an agent is not a property of a single output but of the entire trajectory — and the defenses most teams deploy are scoped to the wrong unit.

The paper identifies three properties that make agent privacy hard. Leakage is cumulative: individually innocuous releases add up, across honest-but-curious or colluding recipients (“sinks”), into an inference about a protected secret. It is bidirectional: a malicious observation can inject instructions that turn the agent’s own reasoning model against its user — the lethal trifecta seen from the privacy side. And it is task-dependent: the same field is necessary for one recipient and gratuitous for another.

How it works

The insight is that a filter deciding “is this single release OK?” cannot see accumulation. Per-release contextual-integrity filters such as AirGapAgent (CCS’24), classic information-flow control, and posterior-leakage monitors each cover part of the problem, but none controls cumulative, inference-based leakage at runtime.

OCELOT recasts the problem as posterior-risk control: a runtime mediator budgets how much an adversary’s belief about a secret is allowed to improve over the course of a trajectory, instead of inspecting outputs in isolation.

per-output filter:   release_i  ->  "OK in isolation?"  ->  allow   (blind to accumulation)

posterior-risk:      belief(secret | releases_1..i)  <=  budget   ->  authorize least-disclosing variant
                     each release charges a certified min-entropy cost against a sink-weighted budget

Its mechanism, Witness-Verified Declassification, deliberately separates judgment from trust. An untrusted, locally fine-tuned “defender” model inspects each candidate release and emits structured evidence — labeled atoms and proposed declassification operators. A deterministic verifier then audits that evidence, charges a certified min-entropy cost for the chosen variant, and authorizes the least-disclosing useful release under a sink-trust-weighted budget recorded on a tamper-evident ledger. Because the verifier is deterministic and the budget is the thing being enforced, a compromised or manipulated defender model can lower utility but cannot quietly overspend the privacy budget — which is what makes the design resistant to the bidirectional case where untrusted content tries to subvert the agent’s own reasoning.

The authors report that, across diverse agent benchmarks and against recent defenses, OCELOT attains lower leakage at higher task utility, and resists adaptive injection, jailbreak, cumulative inference, and sink collusion, while adding only modest overhead. (Specific numbers are in the paper; treat the comparative framing — trajectory budget vs. per-output filter — as the durable takeaway.)

Why it matters

This is an architecture argument, not a single-product bug. As agents move onto MCP and agent-to-agent plumbing, the surface for slow, distributed leakage grows: an agent can disclose a name to one service, a date to another, a location to a third, none individually alarming, yet jointly enough to reconstruct a secret. If your privacy control is a per-message classifier, you can pass every check and still leak the secret across the trajectory. The risk concentrates exactly where agents are most useful: long-running assistants with memory, multi-tool workflows, and multi-agent privacy leakage where outputs flow between models you do not fully trust.

Defenses

OCELOT itself is the defense; the transferable engineering lessons are what to take away.

  1. Budget across the trajectory, not the message. Track cumulative disclosure about each protected secret and enforce a ceiling, rather than scoring each output independently. This is the one change that closes the cumulative-leakage channel.
  2. Separate judgment from trust. Let an (untrusted) model propose what to release and a deterministic verifier decide and meter the cost. A subverted judge should be able to reduce usefulness, never silently exceed the privacy budget.
  3. Weight the budget by sink trust. A field sent to a first-party service is not the same disclosure as the same field sent to an unknown third party. Make recipient trust an explicit term, and assume sinks may collude.
  4. Keep an append-only ledger of disclosures. A tamper-evident record of what was released, to whom, and at what certified cost makes the contextual-integrity decision auditable after the fact — and supports the “at most two of three” budgeting logic of Agents Rule of Two.

Status

ItemReferenceDateNotes
OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM AgentsarXiv:2606.123412026-06-10Runtime posterior-risk mediator; Witness-Verified Declassification; min-entropy budget on a tamper-evident ledger
Privacy in Action (PrivacyChecker / PrivacyLens-Live)arXiv:2509.174882025-09-22CI-based per-release mitigation; dynamic MCP/A2A evaluation (EMNLP 2025 Findings)
AirGapAgentarXiv:2405.051752024-05-08Contextual-integrity minimization against context hijacking (CCS’24)

The takeaway is not that contextual-integrity filters are useless — it is that the unit of enforcement is wrong. Privacy for an agent is decided over a trajectory, and a budget that meters cumulative inference is the checkpoint a per-output filter can never be.

Sources