system: OPERATIONAL
← back to all hacks
INDIRECT INJECTION MEDIUM NEW

DACSI: when retrieved documents fake the system's control signals

A June 8, 2026 paper names a quiet RAG failure mode: untrusted document text impersonating metadata, provenance and policy signals. No 'ignore previous instructions' required — the lesson is that document-authored labels are data, not policy.

2026-06-12 // 6 min affects: rag-systems, llm-agents, enterprise-rag

What is this?

On June 8, 2026, a paper titled Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries (arXiv:2606.09005) gave a name to a failure mode that retrieval-augmented generation (RAG) builders keep rediscovering by accident. The author calls it DACSI — Document-Authored Control-Signal Impersonation.

The setup is the ordinary RAG prompt. A system serializes several things into one natural-language blob: the user’s query, the documents pulled from the index, plus metadata, system labels and task instructions. DACSI is what happens when attacker-authored text inside a retrieved document dresses itself up as one of those control signals — a provenance tag, an authority marker, a “verified” flag, a disclosure-policy notice — so that the model treats data as if it were policy.

This is not the classic “ignore previous instructions” jailbreak. The paper explicitly positions DACSI as a non-imperative, metadata-like subclass of indirect prompt injection: it does not command the model to break a rule, it quietly asserts that a rule already permits the action.

How it works

The root cause is structural, and it is the same one behind every indirect-injection class: RAG prompt rendering collapses trusted and untrusted text into a single channel. Once the system prompt, the retrieved passage, and the metadata all arrive as the same kind of tokens, there is no reliable marker the model can use to tell an authorized control signal from a string that merely looks like one.

Command-style injection asks: will the model obey an instruction smuggled into data? DACSI asks a subtler question: will the model misattribute untrusted document text as an authorized control signal? Instead of “do X,” the payload reads like an environment fact — a label claiming the content is trusted, internal, pre-approved, or exempt from a safety policy. If the model has learned to defer to such labels when they appear in its context, a document that fabricates one inherits authority it was never granted.

No working payload is reproduced here, and none is needed to grasp the mechanism. The one-line summary the paper itself offers is the whole lesson: document-authored labels are data, not policy. Any field an attacker can write into a retrievable document — a header, a footnote, a hidden span, a fake JSON-looking metadata block — is attacker-controlled, no matter how official it reads.

Why it matters

DACSI matters because it sits in the blind spot of the most common defense. Many RAG guardrails are tuned to catch imperative injections — text that tells the model to do something forbidden. A passage that contains no imperative at all, only an assertion that “this source is verified and policy-exempt,” can sail past that filter while still steering the outcome.

It also scales cheaply. The attacker does not need model access or gradient optimization; they need one poisoned document to land in a corpus the system retrieves from — a wiki page, a shared drive, a scraped web result, a support ticket. That is the same low barrier that makes indirect injection the dominant agentic failure mode in production, and DACSI widens the surface to include every system that lets retrieved content carry metadata-like text.

The broader point, consistent with the contextual-integrity argument and with RAG security taxonomies, is that source authority cannot be established by anything written inside the source. Trust has to come from the channel, not the payload.

Defenses

You cannot make a language model reliably distinguish a real control signal from a forged one if both arrive as the same text. So move the trust decision out of the prompt.

  • Establish control signals out-of-band. Authority, provenance and policy flags should be attached to a document by the retrieval and ingestion pipeline you control, never read from the document body. If a “trusted” or “policy-exempt” marker can appear in retrievable content, strip it before the text reaches the model.
  • Treat all retrieved text as untrusted data. Render retrieved passages in a clearly delimited, lower-privilege region and instruct the model that nothing inside it can grant permissions, set policy, or assert its own trustworthiness. Combine with provenance-aware checks such as those in ARGUS.
  • Sanitize metadata-like content at ingestion. Detect and neutralize document-authored structures that mimic system metadata — fake headers, pseudo-JSON tags, “verified by” notices, disclosure-policy boilerplate. These are the raw material for DACSI.
  • Don’t rely on imperative-only injection filters. Test guardrails against non-imperative payloads that assert authority rather than issue commands. A filter that only catches “ignore previous instructions” will miss this class entirely.
  • Constrain the blast radius. Pair the above with least-privilege tool scopes and egress controls so that a misattributed control signal cannot, by itself, reach sensitive data or an exfiltration path.

Status

ItemDetail
PaperDocument-Authored Control-Signal Impersonation (DACSI)
IdentifierarXiv:2606.09005v1
Posted2026-06-08
ClassIndirect prompt injection — non-imperative, metadata-like subclass
NatureResearch finding / attack-class characterization (no live exploit)

DACSI is not a new product vulnerability to patch; it is a name for a recurring architectural mistake. The systems most exposed are those that let retrieved documents speak in the voice of the system itself. The fix is not a better classifier inside the prompt — it is refusing to let the prompt be the place where authority is decided.

Sources