Mnemonic sovereignty: securing the whole memory lifecycle of agents
An April 2026 survey reframes LLM-agent memory security as a six-phase lifecycle and shows the field ignores forgetting, confidentiality and non-adversarial drift.
What is this?
On April 17, 2026, Zehao Lin, Chunyu Li and Kai Chen (MemTensor, Shanghai) posted A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty to arXiv (cs.CR). It introduces no new attack. It is a systematization that argues something the field has been slow to accept: the security of an agent’s long-term memory is an independent class of problem, not a sub-branch of prompt injection or RAG security.
The shift the paper names is concrete. Earlier LLM security asked “will the model leak its training data?” The consequential question for agentic systems is different: can an agent with persistent, writable memory be continuously shaped, cross-session poisoned, read without authorization, and propagated across shared organizational state? Drawing on cognitive neuroscience and the philosophy of memory, the authors characterize agent memory as malleable, rewritable and socially propagating — and build a framework to reason about it end to end.
How it works
The survey’s core contribution is a memory-lifecycle framework: six phases, cross-tabulated against four security objectives. No payloads appear here; the canonical reference is the arXiv HTML.
SIX LIFECYCLE PHASES
1. Write — untrusted content enters durable memory
2. Store & Manage — retention, compression, versioning
3. Retrieve — selection of memory into the live context
4. Execute — retrieved memory shapes planning and tool use
5. Share & Propagate — memory crosses agents, users, sessions
6. Forget / Rollback — deletion, revocation, recovery
FOUR SECURITY OBJECTIVES (cross-cut every phase)
integrity · confidentiality · availability · governance
The framework rests on three properties that make long-term memory genuinely new. Persistence: a single malicious write can be recalled across hundreds of later tasks, long after the conversation that planted it ended — unlike a one-shot prompt injection whose effect dies with the context window. Statefulness: the question is no longer “is this input harmful?” but “what memory state is the system in?” — an agent can drift behaviorally from a cluster of subtly biased episodic memories before any single entry trips a safety classifier. Propagation: in multi-agent and shared-state systems, contamination spreads through internal channels (inter-agent messages, shared stores, tool arguments) across session, role and user boundaries.
A fourth property is quieter but probably more common in practice: an adversary is not always required. Silent cross-user contamination of shared stores, profile facts over-applied to contexts where they no longer hold, and memory-induced sycophancy all arise from ordinary operation. The paper therefore treats memory security as a superset of memory safety — the adversarial and benign-persistence axes share a lifecycle and share mitigations.
Why it matters
Three findings stand out, and each is uncomfortable for teams shipping memory features today.
First, the research literature concentrates on write-time and retrieve-time integrity — the poisoning attacks that grab headlines — while confidentiality, availability, the store and forget phases, and benign-persistence failures remain sparsely studied. The map has large blank regions. Second, no published memory architecture covers all nine governance primitives the authors identify; write-gate validation and post-deletion verification are shared blind spots across every system examined. In plain terms: most agents cannot prove what entered their memory was authorized, and cannot prove that a deleted memory is actually gone. Third, using LLMs themselves as memory-security tools — automated red-teaming, defender-side verification, counterfactual stress-testing — is essential yet barely explored; a defense that has never been stress-tested by an adaptive attacker cannot claim the rigor mature security fields demand.
The unifying idea is mnemonic sovereignty: a system’s verifiable, recoverable governance over what may be written, who may read, when updates are authorized, and which states may be forgotten. The authors argue that future secure agents will be differentiated not by recall capacity but by the quality of their memory governance.
Defenses
The survey is structured so that each lifecycle phase implies a control. Treat memory as a governed boundary, not a trusted cache.
- Write-time: validate before consolidation. Gate the moment content becomes durable. Don’t let a tool- or document-sourced note be committed with the same authority as a verified instruction. This is the blind spot the paper flags most sharply.
- Store-time: version and record provenance. Keep snapshots and a chain of custody for each entry, and audit compression/summarization steps — they silently rewrite what the agent “remembers.”
- Retrieve-time: move from filtering to consensus. Pair trust-aware retrieval, activation-based detection and consensus validation so a single poisoned entry can’t dominate the retrieved context. See our note on hybrid-retrieval defenses against RAG poisoning.
- Execute-time: enforce information-flow control. Constrain what retrieved memory is allowed to do — which tools and authorizations it can reach — so a corrupted note cannot escalate.
- Share-time: principal-scoped policy. In multi-agent systems, scope memory to principals and govern internal channels, where privacy leakage concentrates.
- Forget-time: verify deletion, plan for post-breach. Rollback presupposes versioning; deletion must be verifiable across substrates. Keep audit trails you can actually trust for forensics after an incident.
This complements attack-side work the community already documented — the MPBench poisoning taxonomy, OWASP’s ASI06 memory-poisoning category and temporal memory contamination — by supplying the governance scaffolding around them.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| arXiv 2604.16548 v1 | arXiv (cs.CR) | 2026-04-17 | Survey + memory-lifecycle framework |
| Six phases × four objectives | Paper framework | 2026-04-17 | Write/Store/Retrieve/Execute/Share/Forget |
| ”No architecture covers all 9 governance primitives” | Paper finding | 2026-04-17 | Write-gate + post-deletion verification = blind spots |
| ”An adversary is not always required” | Paper finding | 2026-04-17 | Benign-persistence axis (drift, compression, sycophancy) |
| Mnemonic sovereignty | Paper concept | 2026-04-17 | Verifiable, recoverable memory governance |
The takeaway is not that memory poisoning is new — it isn’t. It is that the field finally has a lifecycle-wide map and a normative goal. If your agent has persistent memory and your governance story stops at an input-side filter, this survey is the documented argument that you are governing one phase out of six.
This article summarizes publicly available research for defensive and educational purposes. It reproduces no exploit code.