ADVERSARIAL
(10)10 hack(s).
PRAC: hijacking a computer-use agent's choice through its attention
An April 2026 Tübingen paper shows one imperceptibly perturbed product image can concentrate a computer-use agent's visual attention and steer 82% of its selections — without ever touching the output.
When the AI reviewer can't read the figure: cross-modal attacks on peer review
A June 2026 arXiv paper (PaperGuard) shows AI peer reviewers are vulnerable not only through text but through figures — black-box prompt injection and white-box image perturbations both flip verdicts.
Rapid Poison: turning a jailbreak defense into an attack surface
A June 15, 2026 arXiv paper shows the proliferation step inside Rapid Response jailbreak defenses can be poisoned at a 1% rate — forcing up to 100% false positives or 96% false negatives in the guard classifier.
Black-Hole Attack: poisoning a vector database through embedding geometry
An April 7, 2026 paper shows a few vectors placed near the embedding centroid get pulled into up to 99.85% of top-10 results — a query-agnostic, model-agnostic poisoning of vector databases.
M3Att: query-agnostic knowledge poisoning of medical multimodal RAG
A May 2026 paper poisons medical image-text RAG without knowing user queries in advance. Imperceptible image perturbations hijack retrieval; ambiguity-guided text evades the model's self-correction — and pre-filter defenses barely dent it.
CRCP: RAG corpus poisoning that survives chunking and reranking
A June 9, 2026 arXiv paper shows many corpus-poisoning attacks quietly fail after reranking — and proposes CRCP, a chunk-aware variant built to survive realistic multi-stage RAG pipelines. The lesson is about how you evaluate, not just how you defend.
HPAA: typography humans read but moderation LLMs miss
A June 8, 2026 paper introduces Human-Perceptible Adversarial Attacks — harmful text that stays obvious to a human reader but slips past LLM content moderation through typographic manipulation.
SlotGCG: adversarial token position, not just content, drives jailbreaks
A June 2026 paper shows GCG-style jailbreaks get ~14% stronger when adversarial tokens are placed at attention-correlated slots inside the prompt — and keep 42% more success under input filtering.
SilentRetrieval: fluent RAG corpus poisoning that slips past perplexity filters
A May 27, 2026 arXiv preprint introduces a two-stage attack that hides goal-hijacking triggers inside fluent documents, reaching 57% LLM-attack success on Natural Questions and MS MARCO with one poisoned record per query.
Usability as a Weapon: how feature requests turn coding LLMs insecure
A May 11, 2026 arXiv paper shows that asking a coding LLM for a faster, simpler or feature-richer version of secure code reliably drops the security constraints. UPAttack reaches 98.1% on GPT-5.2-chat and Gemini-3.