system: OPERATIONAL
← back to all hacks
DATA POISONING MEDIUM NEW

Oracle poisoning: corrupting the knowledge graph an agent reasons over

A paper published on arXiv on May 10, 2026 defines Oracle Poisoning: corrupt the knowledge graph an agent queries at runtime and it reaches wrong conclusions through correct reasoning. Across nine models, trust in poisoned data hit 100% under directed agentic queries.

2026-06-19 // 6 min affects: llm-agents, knowledge-graph-rag, tool-use-agents, gpt-5.1

What is this?

On May 10, 2026, researchers Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston, Ron F. Del Rosario and Timothy Lynar published Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning (arXiv:2605.09822, cs.CR/cs.AI). They define Oracle Poisoning as an attack class in which an adversary corrupts a structured knowledge graph that an AI agent queries at runtime through tool-use, so the agent reaches incorrect conclusions through correct reasoning.

The distinction from prompt injection is the whole point. Prompt injection tampers with an agent’s instructions; Oracle Poisoning tampers with the data the agent reasons over. The model is never tricked into misbehaving — it faithfully retrieves a fact from a trusted tool and reasons soundly from it. The fact is simply false. This is the same family of integrity problem explored for graph-based retrieval in work like KEPo (knowledge-evolution poisoning of graph RAG, ACM Web Conference 2026), but demonstrated here against a production-scale agentic system rather than a benchmark.

How it works

Many agents now treat a knowledge graph as an authoritative oracle: a tool call returns nodes and edges (entities, relationships, claims), and the agent folds those results into its answer. The paper studies a 42-million-node production code knowledge graph and shows six attack scenarios where an adversary alters graph contents — for example, injecting a fabricated claim that a component is secure.

The evaluation uses real SDK tool-use across nine models from three providers (N=30 per model): the model autonomously invokes a graph query tool and reasons from the results. The findings:

  • Directed queries: every tested model accepted poisoned data at 100% once the attacker reached “moderate sophistication” (level L2). In 269 of 270 valid trials the models accepted fabricated security claims.
  • Open-ended prompts: trust dropped to 3–55%, which the authors flag as a prompt-framing confound — and report both conditions honestly rather than cherry-picking.
  • A sophistication gradient with break points: trust flips from 0% to 100% past a minimum attacker skill level, reframing the question as how much effort an attacker needs, not whether it works.
  • Delivery mode is a first-order confound. Evaluating the same payload inline can produce false negatives: GPT-5.1 showed 0% trust when content was tested inline, but 100% under both simulated and real agentic tool-use. Testing a model in a chat box does not tell you how its agent will behave.

No exploit string is needed to understand the lesson, and none is reproduced here: the mechanism is data integrity, not a clever prompt.

Why it matters

Agentic systems increasingly outsource “ground truth” to retrieval layers — knowledge graphs, vector stores, internal wikis — on the assumption that retrieved data is trustworthy. Oracle Poisoning shows that assumption is load-bearing and largely unguarded. If an attacker can write to the oracle, the agent becomes a confident, well-reasoned conduit for the attacker’s claims, and the usual defenses (alignment, instruction hierarchies, prompt-injection filters) never engage because no instruction was injected.

The authors note the attack appears to generalise across the knowledge-graph ecosystem based on analysis of four additional platforms. The practical exposure surface is anywhere an agent has, or can be steered toward, a mutable shared knowledge store — code-intelligence graphs, CMDBs, threat-intel graphs, RAG corpora with write paths.

Defenses

The paper evaluates five defences and is candid that only one is decisive:

  • Read-only access control eliminates the direct mutation vector — if agents and untrusted writers cannot modify the oracle, the cleanest attack path closes. Treat the knowledge graph as a privileged datastore with strict write authorization and audit logging.
  • The remaining four defences are partial and model-dependent; do not rely on any single one. Layer them.
  • Provenance and integrity on graph contents: sign or attribute claims, track who wrote each node/edge, and surface confidence/source to the reasoning step rather than presenting retrieved facts as unconditioned truth.
  • Test under real tool-use, not inline. Because delivery mode flips results, security evaluations and red-team exercises must exercise the actual agentic path, or they will report false negatives.
  • Constrain trust in retrieved claims: require corroboration for high-impact assertions (e.g., “X is secure”), and keep humans in the loop for decisions that hinge on a single retrieved fact.

Status

ItemValue
PaperOracle Poisoning (arXiv:2605.09822)
PublishedMay 10, 2026
ClassKnowledge-graph / oracle data poisoning (distinct from prompt injection)
Tested9 models, 3 providers, 42M-node production code graph
Headline result100% trust in poisoned data under directed agentic queries (L2)
Strongest defenceRead-only access control (others partial, model-dependent)

Key date: May 10, 2026 — first empirical demonstration of knowledge-graph poisoning against a production-scale agentic system. This article summarizes publicly available research for defensive purposes and reproduces no attack payload.

Sources