OFFENSIVE AI MEDIUM NEW

Hands-free firmware VR: an LLM agent reverse-engineers an OT intercom end-to-end

On June 2, 2026, Claroty Team82 ran Claude Opus 4.6 with a Ghidra MCP server against a Zenitel intercom firmware image and re-found a set of known CVEs in under ten minutes — a preview of commoditized firmware vulnerability research.

2026-06-08 // 6 min affects: zenitel-tciv-3plus, ot-embedded-firmware, public-firmware-updates, claude-opus-4-6

What is this?

On June 2, 2026, Claroty’s Team82 published Hands Free: What LLM Driven Vulnerability Research Looks Like (Tomer Goldschmidt). The team took a target they had already reverse-engineered by hand — the Zenitel TCIV-3+, a rugged IP video intercom deployed in industrial and high-security buildings — and asked a generally available model, Claude Opus 4.6, to redo the work from scratch.

In late 2025, Team82’s manual analysis of that device produced five disclosed vulnerabilities: three OS command-injection bugs (CVE-2025-64126, CVE-2025-64127, CVE-2025-64128, all CVSS 9.8), an out-of-bounds write (CVE-2025-64129), and a reflected XSS (CVE-2025-64130). That work took several hours of expert effort. The agent re-discovered most of the same class of issues — and surfaced additional memory-corruption and misconfiguration findings — in under ten minutes, and wrote a disclosure-grade report on its own. The bugs are already patched; the story is the workflow.

How it works

This is not a frontier-model exclusive. The run used Claude Opus 4.6, not the restricted Claude Mythos behind Project Glasswing — which is what makes the result notable. The harness is mundane:

working-dir/
├── CLAUDE.md          # role + method: "you are doing binary VR for a CTF"
├── .mcp.json          # registers a self-built Ghidra MCP server
└── targets/zenitel/
    └── VS-IS_9.1.3.1.zip   # the public firmware update, as shipped

CLAUDE.md framed the task (binary reverse engineering, how to hunt bug classes) and pointed the agent at a custom Ghidra MCP server so it could drive the decompiler programmatically. The only target input was the vendor’s publicly downloadable firmware zip. Team82’s reported timeline:

t+0:00   extracts the firmware filesystem, looks for the web service
t+0:30   locates ipstweb, the web-server binary
t+1:30   recognises it is UPX-packed; install UPX fails (missing),
         so the agent installs UPX itself, then unpacks
t+3:30   loads the binary into Ghidra over MCP, probes for
         command-injection-style sinks
t+6:30   writes a markdown report with decompiled evidence
t<10:00  multiple findings across command injection, memory
         corruption and misconfiguration

One reported finding is a command injection on an authenticated route: the agent surfaced the decompiled sink, noted that attacker input is formatted into a system command, and — usefully — flagged that the route’s IP-address validation could be bypassed. No payloads are reproduced here; the point is the division of labour, not an exploit. The model handled unpacking, decompiler navigation, sink triage and report drafting without going down rabbit holes, keeping context across the whole session.

Why it matters

The headline isn’t “an AI found bugs in a device.” It is that end-to-end firmware vulnerability research ran from a single public artifact, with a mid-tier model, in minutes. Three consequences follow.

First, the barrier to white-box VR is collapsing. Team82’s own framing is blunt: given a publicly available firmware or software update plus the right tooling, an agent can do the whole loop. The scarce input is no longer reverse-engineering expertise — it’s a copy of the target software. Expect the first wave to land on white-box targets where code is easy to obtain: open-source projects and any vendor that ships downloadable firmware.

Second, OT and embedded are squarely in scope. This wasn’t a SaaS endpoint; it was an access-control intercom of the kind that sits unpatched in the field for years. The discovery-to-report time is shrinking toward zero while OT patch windows stay measured in quarters. That gap is the actual risk.

Third, firmware obscurity is a clock, not a control. Encrypted or gated firmware buys time against this pipeline, but Team82 expects those limits to be bypassed eventually. Treating “our firmware isn’t public” as a durable defence is a planning error.

Defenses

The patches for the Zenitel bugs already exist (Zenitel advises upgrading to firmware 9.3.3.0 or later). The transferable defences are about exposure management, not this one device.

Treat every public firmware/software update as adversary-accessible attack surface. If a customer can download it, an agent can enumerate its bug classes. Inventory which of your products ship publicly downloadable images and rank them by exposure.
Mirror the pipeline internally — shift VR left. The harness here is reproducible without Mythos-class access: a coding agent plus a decompiler MCP plus a firmware image. Run it against your own builds before release, and re-run it on shipped firmware to find latent analogues of disclosed CVEs.
Re-baseline OT/embedded patch SLAs and compensating controls. Field devices that update slowly need network-level mitigation now: segment management interfaces, restrict web/SIP admin surfaces to trusted networks, and monitor them. Assume the discovery half of the race is faster than your patch half.
Don’t bank on firmware obscurity. Encryption and gated downloads raise cost but don’t change the end state. Plan for the case where your firmware is obtained and analysed at machine speed.
Prioritise by bug class, not just by CVE. A command-injection sink and a bypassable input check rarely live alone in embedded web servers. When an advisory names one, treat adjacent routes and similar parsers in the same binary as worth a re-review — exactly the kind of breadth an agent surfaces quickly.

Status

Item	Reference	Date	Notes
Hands-free VR write-up	Claroty Team82	2026-06-02	Claude Opus 4.6 + custom Ghidra MCP, <10 min run
Zenitel TCIV-3+ manual disclosure	Claroty	late 2025	CVE-2025-64126/64127/64128 (cmd injection, 9.8), 64129 (OOB write), 64130 (XSS)
Fixed firmware	Zenitel	—	Upgrade to 9.3.3.0 or later
Frontier-tier context	LLM Hacking	2026-05	Mythos/Glasswing is gated; this run used a generally available model

The right way to read this is as a capability baseline, not an incident. A mid-tier, publicly available model already does competent end-to-end firmware VR from a single download. The defensive task is to assume your attackers have the same harness — and to run it against your own code first.