LLMjacking evolves: stolen Ollama compute now drives autonomous attack agents
A June 17, 2026 Sysdig report documents a captured incident: an exposed, unauthenticated Ollama server used as the reasoning engine for a multi-stage offensive pipeline. The fix is operational, not model-side.
What is this?
On June 17, 2026, the Sysdig Threat Research Team (TRT) published an analysis of an incident it observed on June 12, 2026: a threat actor wired a misconfigured, internet-exposed Ollama model server into an automated offensive-security pipeline. The actor was not chatting with the model or reselling access — they used the stolen inference capacity as the decision-making “brain” for a multi-stage tool that fingerprints a target, matches it to known vulnerabilities, drafts proof-of-concept code, and attempts to break in.
This is the convergence of two trends Sysdig has tracked since it coined the term LLMjacking in May 2024. The first is compute theft: abusing someone else’s paid or self-hosted AI capacity so the victim foots the bill. The second is autonomous offensive tooling, long demonstrated only in research. Here they merged in a single captured campaign.
How it works
The enabling condition is mundane. Ollama listens on port 11434 with no authentication by default, so a server bound to a public interface answers anyone who finds it. A January 29, 2026 SentinelLABS/Censys study counted roughly 175,000 exposed Ollama hosts across 130 countries, nearly half of them advertising tool-calling — turning a text endpoint into one that can execute code.
Because the attacker’s tool sends its full instructions to the model on every request, Sysdig captured the entire framework. It drives the model through discrete, strictly-structured stages: service-banner normalisation for CVE lookup, vulnerability matching, web reconnaissance, proof-of-concept synthesis, blind SQL-injection crafting, credential and secret extraction, and an autonomous orchestrator that loops until it reaches command execution. Notably, each stage instructs the model to treat content captured from the target as untrusted data, not instructions — a deliberate defence against prompt injection from the very pages the tool reads.
The framework’s most durable signature is its compromise oracle: it injects a command bracketed by two unique sentinel strings (VAPTb3gin … VAPTfin) and confirms remote code execution by finding those markers around the output of id. Wrapping command output in begin/end markers so a parser can extract it from noise is a recurring tell of AI-generated attack tooling — a human reading a terminal does not bracket output for a machine.
Two details underline that this is real software, not a demo. The tool requested at least seven models by name, including commercial ones (gpt-4o-mini, claude-3-5-sonnet, gemini-2.0-flash-exp) it simply repointed at the free Ollama backend, and it was caught mid-development: stages were added and rewritten across an eight-hour session, all aimed at private practice ranges (RFC 1918 addresses and HackTheBox lab space), not live victims.
Why it matters
Researchers warned two years ago that a capable model handed a vulnerability description could autonomously exploit 87% of a one-day benchmark (Fang et al., April 2024). That warning is now operational, and the economics have collapsed: when the inference is stolen, the marginal cost of running an autonomous attacker trends toward zero for any actor willing to abuse someone else’s compute.
There is also a defensive blind spot. Detection that watches a model server’s own logs assumes the operator owns and monitors it. An exposed server discovered by an outsider is, by definition, one nobody is watching — its owner sees elevated compute and an open port, not a multi-stage attack pipeline running on their hardware.
Defenses
Treat a self-hosted inference endpoint exactly as you would an exposed database:
- Don’t expose port 11434. Bind Ollama (and similar servers like vLLM) to localhost or an internal interface. Any remote access should sit behind a firewall and an authenticating reverse proxy.
- Add authentication at the proxy or network layer. Ollama ships with none, so it must be enforced in front of every endpoint.
- Audit your own ranges. Scan for open port 11434 the way an attacker would, and inventory shadow model servers stood up outside the security perimeter.
- Monitor inference traffic for anomalous request volume and for offensive-tooling signatures — rigid structured-output contracts and marker-bracketed command patterns. Sysdig published the sentinel strings (
VAPTb3gin,VAPTfin,__VAPTCMD__) and the confirmation probe as detection anchors. - Watch for guardrail-stripped models. The campaign’s first probes requested an “abliterated” Llama build; the presence of uncensored model templates on an endpoint is itself a risk signal.
Status
| Item | Detail |
|---|---|
| Reporter | Sysdig TRT |
| Observed | June 12, 2026 (returned June 14) |
| Published | June 17, 2026 |
| Root cause | Unauthenticated, internet-exposed Ollama (port 11434) |
| CVE | None — configuration exposure, not a software flaw |
| Exposure scale | ~175,000 exposed Ollama hosts (SentinelLABS/Censys, Jan 2026) |
This article is educational and defensive. It summarises publicly disclosed research and does not reproduce the operational payloads or stage prompts from the captured framework.