system: OPERATIONAL
← back to all hacks
SUPPLY CHAIN CRITICAL NEW

ktransformers: unauthenticated RCE via pickle over ZeroMQ (CVE-2026-26210)

A critical RCE in the ktransformers inference engine exposes a ZMQ socket on all interfaces and pickle-loads whatever it receives. It is the latest case of the 'ShadowMQ' pattern copied across AI serving stacks.

2026-06-15 // 6 min affects: ktransformers, vllm, sglang, nvidia-tensorrt-llm, meta-llama-stack

What is this?

CVE-2026-26210 is a critical unauthenticated remote-code-execution flaw in ktransformers, a high-performance LLM inference engine maintained by KVCache.AI (about 16,500 GitHub stars, known for running DeepSeek-V3 on a single node). It was disclosed by Valentin Lobstein (Chocapikk) as part of an audit of pickle deserialization in ML inference frameworks, assigned by VulnCheck, and published on April 23, 2026. NVD rates it CVSS 3.1 9.8 (and CVSS 4.0 9.3), classified as CWE-502 (Deserialization of Untrusted Data), and affects ktransformers through 0.5.3.

The root cause is not new — and that is the point. This is the same unsafe combination of ZeroMQ plus Python pickle that Oligo Security documented in November 2025 under the name ShadowMQ, a pattern that propagated across the AI ecosystem (Meta Llama Stack, vLLM, NVIDIA TensorRT-LLM, Modular Max, SGLang) largely through copy-pasted serving code.

How it works

ktransformers’ balance_serve backend (its legacy multi-concurrency mode, enabled with --backend_type balance_serve) starts a scheduler that binds a ZeroMQ ROUTER socket to all network interfaces, then proxies incoming messages to worker threads. Each worker deserializes the raw bytes it receives:

# scheduler worker, simplified
self.frontend.bind(f"tcp://*:{sched_port}")   # listens on every interface
# ... ROUTER/DEALER proxy to worker threads ...
message = worker.recv()
data = pickle.loads(message)                   # CWE-502: untrusted input

pickle.loads() is not a safe parser: deserializing an attacker-controlled object can trigger arbitrary code execution during unpickling. Because the socket requires no authentication and no validation, any host that can reach the port can have code run with the privileges of the ktransformers process. We deliberately omit a working payload; the mechanism (a malicious pickle reducing to an OS command) is standard CWE-502 and is described only to explain the risk.

Two details make exploitation realistic in production. First, the official Docker deployment runs with --network=host, which removes container network isolation and exposes the ZMQ port on the host. Second, although the port is assigned dynamically, it is printed in the server logs and ZMQ sockets are trivially fingerprinted on the wire — Oligo reported thousands of ZMQ sockets exposed to the public internet. CISA’s SSVC entry flags the exploitation maturity as proof-of-concept and the flaw as automatable.

Why it matters

Inference servers sit deep inside AI infrastructure: they hold model weights, prompts, API keys and often run on GPU clusters with broad internal network reach. An RCE on one node can mean code execution across the cluster, lateral movement, secret and model exfiltration, or GPU cryptomining — the same impact class as the earlier ShadowMQ cases.

The broader lesson is about supply chain by imitation. The flaw spread not through a shared dependency but through shared code idiomsrecv_pyobj() and pickle-over-ZMQ patterns copied between fast-moving projects, sometimes with the header comment still reading “Adapted from vLLM.” ktransformers is simply the newest stack to inherit the pattern, which means the next one is probably already shipping.

Defenses

The fix is well understood and the upstream patch (PR #1944) follows it:

  • Upgrade past 0.5.3, or avoid the balance_serve backend entirely. The modern SGLang-integrated deployment path does not start the vulnerable ZMQ socket.
  • Never pickle.loads() untrusted data. Use a non-executable format (JSON or MessagePack) for RPC; inference requests are structured data that does not need pickle. This is exactly how Meta, vLLM and Modular fixed their ShadowMQ variants.
  • Bind to 127.0.0.1 by default, not tcp://*. Require explicit opt-in before any socket listens on a routable interface.
  • Authenticate the channel. ZeroMQ supports CurveZMQ (and HMAC patterns); even a shared secret stops casual exploitation.
  • Do not run inference containers with --network=host. Keep container network isolation and expose only the intended HTTP API.
  • Hunt for exposure: scan your own ranges for ZMQ greeting bytes and confirm no scheduler/RPC port is reachable from outside the trusted boundary.

Status

ItemDetail
CVECVE-2026-26210 (CWE-502)
Affectedktransformers ≤ 0.5.3 (balance_serve backend)
SeverityCVSS 3.1 9.8 / CVSS 4.0 9.3 (Critical)
DisclosedCode audit Feb 11, 2026; CVE published Apr 23, 2026
FixUpstream PR #1944; prefer SGLang-integrated deployment
CreditValentin Lobstein (Chocapikk), assigned by VulnCheck
Pattern”ShadowMQ” — ZMQ + pickle reuse across AI serving stacks (Oligo, Nov 2025)

Sources