MULTIMODAL

(3)

3 hack(s).

Sirens' Whisper: inaudible near-ultrasonic jailbreaks of voice LLMs

A March 14, 2026 paper from Huazhong, Tsinghua and Microsoft hides jailbreak prompts in the 17–22 kHz band. Microphone nonlinearity demodulates them back into commands — silent to humans, up to 0.94 non-refusal on commercial voice LLMs.

2026-06-18//7 min

MULTIMODAL MEDIUM

CrossMPI: image-only prompt injection steers what VLMs read and see

A May 15, 2026 Xidian University arXiv paper introduces CrossMPI: imperceptible image perturbations that change how vision-language models interpret both the image and the user's text prompt, with 66% average success across five LVLMs.

2026-05-28//6 min

MULTIMODAL CRITICAL

AudioHijack: imperceptible audio hijacks voice agents (IEEE S&P 2026)

An April 16, 2026 IEEE S&P paper introduces auditory prompt injection: adversarial reverb hidden in audio drives 13 large audio-language models and commercial voice agents (Mistral AI, Microsoft Azure) into unauthorized actions with 79-96% success.

2026-05-26//7 min