Dynamic separators: hardening Polymorphic Prompt Assembling against injection
A May 28, 2026 arXiv paper fixes a blast-radius flaw in Polymorphic Prompt Assembling by generating a unique SHA-256 separator per request, cutting one payload's attack success rate from 0.88 to 0.38.
What is this?
On May 28, 2026, Nima Dorzhiev and Peng Liu published Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks (arXiv:2605.30534). It is a short, focused defensive paper — five pages — that patches a weakness in an earlier prompt-injection defense rather than announcing a new attack.
The earlier defense is Polymorphic Prompt Assembling (PPA), introduced in arXiv:2506.05739 in June 2025. PPA’s idea is simple: prompt injection generally requires the attacker to guess and break the structure that separates trusted system instructions from untrusted user input. If you randomize that structure, the attacker can no longer reliably predict where the boundary is. PPA does this by drawing separator pairs from a fixed pool at assembly time, at near-zero runtime cost. The May 2026 paper identifies what that fixed pool leaves on the table, and closes it.
How it works
The flaw the new paper targets is blast radius. In the original PPA, separators are sampled from a static pool. Pools are finite, and separators can leak — an injected payload that successfully echoes or reflects the current delimiter discloses it. Once a separator is known, it can be reused against every future request that happens to draw the same pair. One leak compromises many requests.
The fix is to stop reusing separators at all. Instead of sampling from a pool, the defense derives a fresh delimiter for every request from a domain-separated SHA-256 digest keyed on three inputs:
separator = H( domain_tag || timestamp || session_id || nonce )
│ │ │ │
prevents cross-context per-request entropy unpredictable
digest collisions binding per call
Each assembled prompt gets a unique (BEGIN, END) canary pair. Because the pair is never reused, a leaked separator only exposes the single request it belonged to — the blast radius collapses from “many future requests” to “one”. The construction is deterministic from its inputs (so the harness can still recognize its own delimiters when parsing the model’s output) but unpredictable to an attacker who lacks the nonce.
The reported numbers, evaluated on Llama-3.3-70B-Instruct-Turbo with cross-model validation on DeepSeek-V4-Flash:
- Against the “M1” obfuscation payload (leetspeak plus urgency framing), the dynamic mode lowered the Attack Success Rate from 0.88 to 0.38 — a 2.3× reduction the authors report as statistically significant via non-overlapping 95% Wilson confidence intervals.
- Against a
format_breakout_saladpayload, static separator leakage (leak rate 0.467) dropped to 0.000 in dynamic mode, confirming the blast-radius argument empirically. - Overhead is 2.7 microseconds of prompt-assembly time per request, with no model fine-tuning, and the change is backward-compatible with the existing PPA SDK.
Note the honest ceiling: an ASR of 0.38 is a large improvement, not immunity. Structural randomization raises the cost of injection; it does not eliminate it.
Why it matters
Prompt injection remains the top entry on the OWASP Top 10 for LLM Applications (LLM01), and most production guardrails fall into two camps: heavy model-based classifiers that add latency and still miss novel phrasings, or brittle string filters. PPA sits in a third category — a cheap structural defense that doesn’t depend on recognizing the content of an attack, only on denying the attacker knowledge of the prompt’s layout.
The dynamic-separator result matters because it shows this category has a clean engineering knob. Moving from a static pool to per-request derivation is a small code change with a measurable, repeatable security gain and microsecond-scale cost. For teams already assembling prompts from templates, this is the kind of mitigation that ships in an afternoon rather than a quarter — and it composes with, rather than replaces, classifiers and capability sandboxing.
Defenses
If you build LLM applications and want to apply the lesson:
- Treat the prompt boundary as a secret, per request. Whatever scheme you use to separate system instructions from untrusted input, do not reuse a fixed set of delimiters. Derive them per request from unpredictable material (a nonce plus session and timestamp), so a single leak cannot be replayed.
- Use domain separation in the derivation. A domain tag in the hash input prevents a separator generated for one context from being valid in another — the same hygiene used in cryptographic key derivation.
- Keep delimiters out of model-visible reflection paths. Strip or neutralize any echo of the current separators in tool outputs and model responses before they re-enter a prompt, to shrink the leak surface in the first place.
- Layer, don’t substitute. Structural randomization is a force multiplier, not a guarantee — a 0.38 residual ASR is still real. Combine it with input/output filtering, least-privilege tool scopes, and the lethal-trifecta discipline of not granting untrusted-input handlers both private-data access and exfiltration paths.
- Measure with confidence intervals. The paper’s use of Wilson intervals over raw ASR percentages is a good habit. When you A/B a guardrail, report uncertainty so a noisy 5-point move isn’t mistaken for a real one.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| Polymorphic Prompt Assembling (PPA) | arXiv:2506.05739 | 2025-06 | Original defense; static separator pool |
| Dynamic Separator Generation | arXiv:2605.30534 | 2026-05-28 | Per-request SHA-256 separators; fixes blast radius |
| Evaluation models | per paper | 2026-05 | Llama-3.3-70B-Instruct-Turbo; DeepSeek-V4-Flash (cross-model) |
| Reported effect | per paper | 2026-05 | M1 payload ASR 0.88 → 0.38; leak rate 0.467 → 0.000 |
| Overhead | per paper | 2026-05 | ~2.7 µs/request; no fine-tuning; backward-compatible SDK |
The takeaway is not “prompt injection is solved.” It is that a known structural defense had a reusable-secret bug, someone closed it with a per-request derivation, and the published numbers say the fix is cheap and measurable. That is exactly the kind of incremental hardening defenders should be reading the literature for.