DEFENSE LOW NEW

Dynamic separators: hardening Polymorphic Prompt Assembling against injection

A May 28, 2026 arXiv paper fixes a blast-radius flaw in Polymorphic Prompt Assembling by generating a unique SHA-256 separator per request, cutting one payload's attack success rate from 0.88 to 0.38.

2026-06-02 // 6 min affects: llama-3.3-70b-instruct-turbo, deepseek-v4-flash, llm-agents

What is this?

On May 28, 2026, Nima Dorzhiev and Peng Liu published Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks (arXiv:2605.30534). It is a short, focused defensive paper — five pages — that patches a weakness in an earlier prompt-injection defense rather than announcing a new attack.

The earlier defense is Polymorphic Prompt Assembling (PPA), introduced in arXiv:2506.05739 in June 2025. PPA’s idea is simple: prompt injection generally requires the attacker to guess and break the structure that separates trusted system instructions from untrusted user input. If you randomize that structure, the attacker can no longer reliably predict where the boundary is. PPA does this by drawing separator pairs from a fixed pool at assembly time, at near-zero runtime cost. The May 2026 paper identifies what that fixed pool leaves on the table, and closes it.

How it works

The flaw the new paper targets is blast radius. In the original PPA, separators are sampled from a static pool. Pools are finite, and separators can leak — an injected payload that successfully echoes or reflects the current delimiter discloses it. Once a separator is known, it can be reused against every future request that happens to draw the same pair. One leak compromises many requests.

The fix is to stop reusing separators at all. Instead of sampling from a pool, the defense derives a fresh delimiter for every request from a domain-separated SHA-256 digest keyed on three inputs:

separator = H( domain_tag || timestamp || session_id || nonce )
                    │            │            │           │
        prevents cross-context   per-request entropy   unpredictable
        digest collisions        binding              per call

Each assembled prompt gets a unique (BEGIN, END) canary pair. Because the pair is never reused, a leaked separator only exposes the single request it belonged to — the blast radius collapses from “many future requests” to “one”. The construction is deterministic from its inputs (so the harness can still recognize its own delimiters when parsing the model’s output) but unpredictable to an attacker who lacks the nonce.

The reported numbers, evaluated on Llama-3.3-70B-Instruct-Turbo with cross-model validation on DeepSeek-V4-Flash:

Against the “M1” obfuscation payload (leetspeak plus urgency framing), the dynamic mode lowered the Attack Success Rate from 0.88 to 0.38 — a 2.3× reduction the authors report as statistically significant via non-overlapping 95% Wilson confidence intervals.
Against a format_breakout_salad payload, static separator leakage (leak rate 0.467) dropped to 0.000 in dynamic mode, confirming the blast-radius argument empirically.
Overhead is 2.7 microseconds of prompt-assembly time per request, with no model fine-tuning, and the change is backward-compatible with the existing PPA SDK.

Note the honest ceiling: an ASR of 0.38 is a large improvement, not immunity. Structural randomization raises the cost of injection; it does not eliminate it.

Why it matters

Prompt injection remains the top entry on the OWASP Top 10 for LLM Applications (LLM01), and most production guardrails fall into two camps: heavy model-based classifiers that add latency and still miss novel phrasings, or brittle string filters. PPA sits in a third category — a cheap structural defense that doesn’t depend on recognizing the content of an attack, only on denying the attacker knowledge of the prompt’s layout.

The dynamic-separator result matters because it shows this category has a clean engineering knob. Moving from a static pool to per-request derivation is a small code change with a measurable, repeatable security gain and microsecond-scale cost. For teams already assembling prompts from templates, this is the kind of mitigation that ships in an afternoon rather than a quarter — and it composes with, rather than replaces, classifiers and capability sandboxing.

Defenses

If you build LLM applications and want to apply the lesson:

Treat the prompt boundary as a secret, per request. Whatever scheme you use to separate system instructions from untrusted input, do not reuse a fixed set of delimiters. Derive them per request from unpredictable material (a nonce plus session and timestamp), so a single leak cannot be replayed.
Use domain separation in the derivation. A domain tag in the hash input prevents a separator generated for one context from being valid in another — the same hygiene used in cryptographic key derivation.
Keep delimiters out of model-visible reflection paths. Strip or neutralize any echo of the current separators in tool outputs and model responses before they re-enter a prompt, to shrink the leak surface in the first place.
Layer, don’t substitute. Structural randomization is a force multiplier, not a guarantee — a 0.38 residual ASR is still real. Combine it with input/output filtering, least-privilege tool scopes, and the lethal-trifecta discipline of not granting untrusted-input handlers both private-data access and exfiltration paths.
Measure with confidence intervals. The paper’s use of Wilson intervals over raw ASR percentages is a good habit. When you A/B a guardrail, report uncertainty so a noisy 5-point move isn’t mistaken for a real one.

Status

Item	Reference	Date	Notes
Polymorphic Prompt Assembling (PPA)	arXiv:2506.05739	2025-06	Original defense; static separator pool
Dynamic Separator Generation	arXiv:2605.30534	2026-05-28	Per-request SHA-256 separators; fixes blast radius
Evaluation models	per paper	2026-05	Llama-3.3-70B-Instruct-Turbo; DeepSeek-V4-Flash (cross-model)
Reported effect	per paper	2026-05	M1 payload ASR 0.88 → 0.38; leak rate 0.467 → 0.000
Overhead	per paper	2026-05	~2.7 µs/request; no fine-tuning; backward-compatible SDK

The takeaway is not “prompt injection is solved.” It is that a known structural defense had a reusable-secret bug, someone closed it with a per-request derivation, and the published numbers say the fix is cheap and measurable. That is exactly the kind of incremental hardening defenders should be reading the literature for.