MemMark: attributing a poisoned agent memory from the snapshot alone
A May 26, 2026 arXiv paper embeds ownership into an agent's latent memory-write decisions, so provenance survives even when logs are erased and only the final memory snapshot remains.
What is this?
On May 26, 2026, researchers led by Haobo Zhang (Zhejiang University of Technology) and a multi-institution team published MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems (arXiv:2605.25002, cs.CR). It tackles a forensic question that becomes urgent once agents keep persistent memory: after a memory store has been tampered with, can you still prove who actually wrote a given entry — using only the final snapshot, with no trustworthy logs?
The motivation is that long-term memory has become part of the agent’s security boundary. Systems like A-Mem, Graphiti, Mem0 and MemOS maintain state through extraction, updates, consolidation, linking and deletion. The usual answer to “who wrote this?” is provenance metadata — source anchors, versions, lifecycle traces. MemMark’s point is that those fields have a circular failure mode: the same untrusted snapshot holds both the contested memory and the mutable fields that are supposed to certify it. An attacker who controls the store can rewrite ownership, erase identifiers, fabricate provenance chains, or edit backend-native histories such as A-Mem evolution logs and Graphiti fact-invalidation traces.
How it works
Instead of trusting self-reported fields, MemMark embeds attribution into the latent decisions the agent makes when it writes memory — choices that are normally invisible but utility-preserving:
- update-target — which existing item to update
- link-target — which related item to link
- semantic-realization — which of several equivalent phrasings to store
At each internal LLM call, MemMark enumerates the admissible candidates and uses a secret-keyed, distribution-preserving sampler to pick one. Because the sampler preserves the backend’s own preference distribution, output quality is essentially unchanged, but the pattern of choices now carries an owner-controlled signal. Each decision is bound to a cryptographic commitment, recorded in a per-session Merkle tree with a signed anchor, and the reveal data is stored alongside the memory record.
Crucially, verification degrades gracefully across three regimes: R1 (full external log), R2 (partial log), and R3 (snapshot-only). In the snapshot-only R3 case — the realistic post-compromise scenario — MemMark recovers the full 40-bit payload, versus no recovery for a signed-metadata-only baseline and ~15% for a wrong key. Across six model–backend settings on the LoCoMo benchmark it kept 99.6% of the unwatermarked Overall F1 (BLEU-1 changed by +0.2%), and under nine memory-lifecycle attacks at three strengths it still distinguished tampering, evidence deletion, and partial-payload recovery.
Why it matters
Most agent-memory security work so far has been about preventing poisoning (AgentPoison and related work; see our notes on memory poisoning and dormant memory exfiltration). MemMark addresses the step after a breach: attribution and accountability. That matters for incident response, IP disputes, multi-tenant deployments, and regulatory provenance, where “the logs say X” is worthless if the attacker also controlled the logs.
It also moves provenance from editable claims to a reproducible behavioral trace. Prior watermarks live in generated text, protected corpora, visible tool use, or action trajectories — all evidence channels that may simply be missing in memory forensics. MemMark targets the one durable artifact that usually survives: the memory snapshot itself. This fits the broader “mnemonic sovereignty” framing of memory as a first-class asset to govern across its whole lifecycle.
Defenses
MemMark is a building block, not a turnkey product. For teams running memory-backed agents:
- Keep trusted write-time logging as the primary control. MemMark is explicitly a fallback for when logs are lost, withheld, or suspect — not a replacement. Pair it with tamper-evident audit trails and execution provenance.
- Don’t rely on self-reported provenance fields alone. Treat ownership/version metadata inside a snapshot as attacker-controllable; design verification that does not depend on the same store certifying itself.
- Protect the key. Snapshot-only attribution rests on a secret key and signed anchors; key compromise collapses the guarantee, so manage it like any signing key (HSM, rotation, separation from the agent runtime).
- Scope expectations. The demonstrated payload is 40 bits with low per-decision entropy (~1.1–1.3 bits), so attribution needs enough memory-write decisions to accumulate signal; very short-lived sessions carry less.
- Validate on your backend. Results cover A-Mem and Graphiti on LoCoMo; carrier availability depends on how your memory system makes update/link/realization choices.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| MemMark preprint | arXiv:2605.25002 | 2026-05-26 | State-evolution attribution watermark for agent memory |
| Snapshot-only result | §5.4 (R3) | 2026-05-26 | Full 40-bit recovery vs none for metadata-only baseline |
| Utility | §5.2 | 2026-05-26 | 99.6% of unwatermarked Overall F1; BLEU-1 +0.2% |
| Robustness | §5.5 | 2026-05-26 | Diagnostic under nine memory-lifecycle attacks |
| Threat context | mnemonic-sovereignty survey (arXiv:2604.16548); AgentPoison (arXiv:2407.12784) | 2024–2026 | Agent memory poisoning and lifecycle attacks |
The takeaway: as agents move from single-session responders to persistent actors, memory provenance becomes a security problem in its own right — and MemMark shows that attribution can be made to survive an untrusted snapshot, provided you keep the key safe and treat it as a complement to trusted logging rather than a substitute.