system: OPERATIONAL
← back to all hacks
INFRASTRUCTURE CRITICAL NEW

SGLang's ZMQ broker: unauthenticated RCE via pickle deserialization

Three CVEs disclosed March 12, 2026 turn SGLang's pickle.loads() calls into unauthenticated remote code execution. The fix landed in v0.5.10 — but the real lesson is that pickle on a network socket is RCE by design.

2026-06-04 // 6 min affects: sglang, multimodal-llm-serving, ai-inference-infrastructure

What is this?

On March 12, 2026, Orca Security researcher Igor Stepansky and the CERT Coordination Center co-published three vulnerabilities in SGLang, a widely used open-source framework for serving large language models and multimodal models (it fronts Qwen, DeepSeek, Mistral, Skywork and others behind an OpenAI-compatible API). Two of them — CVE-2026-3059 and CVE-2026-3060, both rated CVSS 9.8 in the GitHub Advisory Database — allow unauthenticated remote code execution against any SGLang deployment that exposes the affected feature to the network. A third, CVE-2026-3989 (CVSS 7.8), is a local/social-engineering variant in a crash-dump replay script. All three share one root cause: untrusted data fed to Python’s pickle. The fix shipped in version 0.5.10.

This is the same vulnerability class as the LightLLM pickle RCE and sits squarely in the 2026 wave of AI-infrastructure CVEs alongside the LiteLLM pre-auth SQLi and LMDeploy SSRF. We cover it because the pattern keeps recurring, and the mitigation is structural, not cosmetic.

How it works

Python’s pickle format does not just store data — it stores instructions for reconstructing objects, and those instructions execute during deserialization. So pickle.loads() on any attacker-controlled bytes is, by definition, arbitrary code execution. The classic primitive is an object whose __reduce__ method returns a call to os.system or subprocess; nothing exotic is required, which is why no payload is reproduced here.

SGLang exposed pickle.loads() on network-reachable paths with no authentication:

CVE        Component                          Vector / precondition
---------  ---------------------------------  ----------------------------------------
3059       Multimodal generation module       Unauthenticated RCE via the ZMQ broker;
(9.8)                                          requires multimodal generation enabled
3060       Encoder parallel disaggregation    Unauthenticated RCE via the disaggregation
(9.8)                                          module; requires disaggregation enabled
3989       replay_request_dump.py             pickle.load() of an attacker-supplied .pkl
(7.8)                                          crash dump; needs local file/dir control

Per CERT/CC, for the two critical bugs “if an attacker knows the TCP port on which the ZMQ broker is listening and can send requests to the server, they can exploit the vulnerability by sending a malicious pickle file to the broker, which will then deserialize it.” No prompt, no model interaction, no credential — a single crafted message on the ZeroMQ socket is enough. The weakness is catalogued as CWE-502, Deserialization of Untrusted Data. CVE-2026-3989 is narrower: the replay_request_dump.py utility calls pickle.load() on a crash-dump file without validation, so an attacker who can write to the dump directory (or socially engineer an operator into replaying a malicious .pkl) gets code execution on the host running the script.

Why it matters

An inference server is not a low-value target. It typically runs on GPU hosts with broad internal network reach, holds model weights, and processes the prompts and documents your applications send it. Code execution in the SGLang process can lead to host compromise, lateral movement, data exfiltration, or denial of service. Because the two critical bugs are pre-authentication and triggered by a raw socket message, any deployment that put its ZMQ broker on a reachable interface was exploitable by anyone who could route a packet to it.

Two patterns deserve attention. First, the ZMQ/IPC plumbing of distributed inference is an unauthenticated trust boundary that teams routinely forget exists — it was designed for intra-cluster communication, not hostile networks. Second, pickle keeps reappearing in the AI stack because it is the path of least resistance for moving Python objects (tensors, requests, crash dumps) between processes. As of writing, no in-the-wild exploitation had been reported, but the EPSS score and the trivial exploit primitive make these prime candidates for opportunistic internet scanning.

Defenses

  • Upgrade to SGLang v0.5.10 or later, which CERT/CC lists as the fixed release. Note the lag: the GitHub Advisory Database still shows affected <= 0.5.9 with no patched version recorded, so verify against the upstream release rather than the advisory DB field.
  • Never expose the ZMQ broker to untrusted networks. Bind inference-internal sockets to 127.0.0.1 or a private interface, and put network segmentation and access controls between the broker port and anything routable. This single step neutralizes CVE-2026-3059 and CVE-2026-3060 regardless of version.
  • Treat pickle on a socket as RCE. Where you control serialization, replace pickle with a data-only format — JSON or msgpack — so a malformed payload cannot smuggle code. This is the structural fix CERT/CC recommends.
  • Lock down crash-dump handling for CVE-2026-3989: restrict write access to dump directories, and never run replay_request_dump.py on a .pkl you did not generate.
  • Monitor the inference host. Watch for unexpected inbound TCP to the ZMQ port, unexpected child processes spawned by the SGLang Python process, file creation in unusual locations, and outbound connections to unfamiliar destinations — the indicators CERT/CC and Orca call out.
  • Inventory your serving layer. SGLang, vLLM, LightLLM, LMDeploy and similar tools are often stood up by ML teams outside standard security review. Find the network-exposed ones before a scanner does.

Status

ItemDateStatus
CERT/CC vulnerability note (VU#665416)Mar 12, 2026Public
Orca Security disclosure blogMar 12, 2026Public
GitHub Advisory DB entries (CVE-2026-3059/3060)Mar 12, 2026Public, CVSS 9.8
CERT/CC note last revisedApr 7, 2026Lists v0.5.10 as fixed
Coverage (The Hacker News)Mar 17, 2026Public
In-the-wild exploitationNone reported as of disclosure

The takeaway is blunt: pickle.loads() on a network-reachable socket is not a bug to be patched once, it is an anti-pattern to be removed. The fix is v0.5.10; the lesson is that any AI inference component listening on a port should be treated as an authentication and deserialization boundary from day one.

Sources