system: OPERATIONAL
← back to all hacks
AGENTS MEDIUM NEW

ShadowMerge: poisoning graph-based agent memory by colliding relations

A May 2026 paper poisons graph-based agent memory with relations that share a real anchor and channel but carry a conflicting value — reaching 93.8% attack success on Mem0 while input-side filters miss it.

2026-06-18 // 5 min affects: mem0, graph-based agent memory, rag agents

What is this?

ShadowMerge (arXiv 2605.09033, Luo et al., first posted 9 May 2026, revised 15 May) is a poisoning attack against graph-based agent memory. Instead of storing past interactions as flat text, a growing number of agent stacks keep them as a knowledge graph of entities and relations — this is what frameworks like Mem0 do to support structured long-term recall and multi-hop reasoning. ShadowMerge’s point is that the graph structure is not just a feature; it is a new poisoning surface. The authors evaluated the attack on Mem0, report that they responsibly disclosed it to the affected graph-memory vendors, and open-sourced their code.

How it works

Earlier memory-poisoning work such as AgentPoison (Chen et al., NeurIPS 2024) targets flat records: inject a malicious instance, get it retrieved. Against a graph memory that often fails, because a hostile relation has to clear three gates before it can influence the agent. ShadowMerge is the description of why those gates can be slipped — there is no exploit payload to run.

Gate              What a poisoned relation must do
----------------  --------------------------------------------------------
Extraction        Be parsed by the memory pipeline into a stored relation
Merge             Land in the target anchor's neighborhood (not a stray node)
Retrieval         Be selected as evidence for the victim's later query

The key insight is a relation-channel conflict. A poisoned relation can share the same query-activated anchor (the entity the query lights up) and the same canonicalized relation channel (the normalized relation type the system merges on) as legitimate evidence — while carrying a conflicting value. The authors’ AIR pipeline phrases this conflict as an ordinary interaction, so the memory system itself extracts it, merges it next to the real evidence, and later retrieves it. Crucially, this needs only query-only, ordinary-interaction access: no inserting corpus documents, no editing the graph index.

On Mem0 across PubMedQA, WebShop, and ToolEmu, the authors report a 93.8% average attack success rate, a 50.3-point absolute gain over the best baseline, with negligible impact on unrelated benign tasks. Their defense analysis finds that representative input-side defenses are not enough to stop it.

Why it matters

Graph memory is adopted precisely for high-value, long-horizon reasoning, which is also where a quietly corrupted “fact” does the most damage. Two properties make this attack awkward to defend. First, the access model is weak: an attacker who can merely interact normally can plant the poison, rather than needing write access to a corpus or index. Second, because the poison rides on the same anchor and channel as genuine evidence, auditing memory entries one at a time tends to miss it — the malicious relation only looks wrong next to the legitimate one it contradicts.

The honest caveats: these are single-paper numbers, measured on one framework (Mem0) and three research benchmarks, and the success rate will depend on configuration. The vendors were notified under responsible disclosure, so treat specific figures as a research result, not a universal constant for your deployment.

Defenses

  1. Stop auditing memory entries in isolation. A-MemGuard (Wei et al., arXiv 2510.02373) makes this concrete: consensus-based validation compares reasoning paths derived from multiple related memories, and a dual-memory structure distills detected failures into “lessons” consulted before future actions. The authors report cutting attack success by over 95% with minimal utility cost.
  2. Treat the merge step as a trust boundary. When a new relation conflicts with existing high-confidence evidence on the same anchor and channel, flag it for review instead of silently merging or overwriting.
  3. Keep provenance on every relation. Record which interaction or source produced each edge, weight by source trust, and prefer corroborated relations at retrieval time.
  4. Raise the bar for writing facts. Don’t let a single ordinary interaction establish a durable graph fact; require corroboration before a relation becomes high-confidence long-term memory.
  5. Re-test on your own stack. Input-side filtering is shown here to be insufficient against relation-channel conflicts — measure against this attack class specifically before relying on a single layer.

Status

ItemReferenceDateNotes
ShadowMergearXiv 2605.090332026-05-09 (rev. 05-15)Graph-memory poisoning via relation-channel conflict; evaluated on Mem0; responsibly disclosed, open-sourced
A-MemGuardarXiv 2510.023732025-10Proactive memory defense: consensus validation + dual-memory lessons
AgentPoisonproject pageNeurIPS 2024Prior art: backdoor poisoning of flat agent memory / RAG knowledge bases

The shift here is conceptual: graph memory was supposed to make recall more structured and therefore safer. ShadowMerge shows the structure can be turned against itself — a fact that lies most convincingly when it sits right next to the truth it contradicts.

Sources