system: OPERATIONAL
← back to all hacks
OFFENSIVE AI MEDIUM NEW

AI threat actors mapped to MITRE ATT&CK: the ARiES score and what it breaks

Anthropic's June 3, 2026 report maps a year of AI-enabled cyberattacks to MITRE ATT&CK. The finding for defenders: sophistication, technique count and interface no longer predict an actor's risk — orchestration does.

2026-06-04 // 7 min affects: claude, claude-code, frontier-llms

What is this?

On June 3, 2026, Anthropic’s Frontier Red Team (Kyla Guru, Alex Moix and Jacob Klein) published a report mapping a year of AI-enabled cyber misuse onto MITRE ATT&CK, with a longer technical write-up on the Red blog and some results contributed to Verizon’s 2026 Data Breach Investigations Report.

The dataset is 832 accounts banned for cyber-related Usage Policy violations between March 2025 and March 2026 — the subset with enough detail to map their tactics, techniques and procedures onto MITRE ATT&CK V18. In total the team recorded 13,873 observed actions across 482 unique techniques and all 14 ATT&CK tactics. This is a measurement of how threat actors misuse generally available models today, not a vulnerability disclosure — the value is in the trend lines it exposes for defenders.

How it works

The report introduces a scoring method, the AI Risk Enablement Score (ARiES), that rates each actor from 0 to 100. It is deliberately additive — Threat (0–35) + Vulnerability/interface (0–35) + Impact (0–30) — rather than the classical multiplicative Threat × Vulnerability × Impact. The reason: an additive score still surfaces a case where one dimension is missing (for example, a working malware build with no identified victim yet), which is exactly the early signal a detection system wants.

What threat actors actually ask models for is lopsided toward the preparatory stages:

Technique (MITRE ID)              Share of actors   Stage
--------------------------------  ----------------  -----------------------------
T1587.001 Malware Development      560 / 832 (67%)  Resource development
T1027 Obfuscated Files/Info        64.7%            Defense evasion
T1005 Data from Local System       55.9%            Collection
T1562 Impair Defenses              54.9%            Defense evasion
T1087 Account Discovery            (rising +8.9%)    Post-compromise discovery
T1021 Remote Services (SSH/SMB)    ~0.7% of obs.     Lateral movement
T1003 OS Credential Dumping        rare              Credential access

Defense evasion is the single largest tactic, present for 84.4% of actors. Hands-on-keyboard, in-network work is rare — lateral movement appears for only 54 of 832 actors (6.5%). The median actor used 16 distinct techniques, and 80% went through Claude Code, making agentic tooling the default mode of access rather than a distinguishing one.

The headline finding is what doesn’t predict risk. After removing circularity, an actor’s assessed technical sophistication correlates with the rest of the risk score at only r = 0.28; breadth of technique coverage at r = 0.27; and the interface (chat, API, or agentic coding tool) shows no meaningful correlation at all. The durable differentiator is where in the kill chain AI is applied — and, above that, the scaffolding an actor builds to chain stages together autonomously.

The clearest example is GTG-1002, the Chinese state-sponsored espionage operation Anthropic disrupted in November 2025. Its MITRE profile — 30 techniques across 13 tactics — looked like a medium-risk actor, yet it scored the maximum 100. The difference was orchestration: Claude Code on a Kali machine, open-source pentest tools wired in as MCP servers, the model executing reconnaissance, exploiting an SSRF, harvesting credentials and pivoting laterally with humans intervening only at a handful of decision points.

Why it matters

Three shifts matter for defenders.

First, risk triage based on actor sophistication is breaking. Lateral movement, privilege escalation and account discovery used to imply a capable, well-resourced operator. The report shows AI performing those steps on behalf of low-skill actors — and the share of medium-or-higher-risk actors rose from 33% to 56% in under a year (≈1.7×) without the actors themselves getting more skilled.

Second, the riskiest behavior is moving in-network. Account discovery (T1087) rose 8.9% and automated exfiltration (T1020) rose 6.2% half-over-half, while phishing (T1566) fell 8.6%. Actors who used AI for lateral movement averaged a risk score of 56.4 versus a mean of 46.8 — the single strongest predictor in the data.

Third, MITRE ATT&CK doesn’t yet have IDs for the behaviors that define the worst actors: autonomous kill-chain orchestration, real-time pivot decisions, and AI-directed execution with no human in the loop. Anthropic says it is in active discussions with MITRE about adding cross-cutting categories for these agentic patterns. (For background on why agents change the attack surface, see agents as operating systems and Project Glasswing.)

Defenses

This report is a planning input, not a patch. The takeaways are about detection design and triage.

  1. Stop ranking actors by sophistication, technique count or interface. Each is a weak predictor (r ≈ 0.27–0.28, or none). Re-weight your threat-scoring toward which techniques an actor reaches for and how they chain them, not how many.
  2. Instrument for post-compromise AI use. The rising, high-risk signals are account discovery (T1087), automated exfiltration (T1020), remote services (T1021), OS credential dumping (T1003) and web shells (T1505.003). Lateral movement is the strongest single marker — alert hard on it.
  3. Detect orchestration, not just techniques. Build signals for multistep autonomous execution, AI-directed pivots, and tool-augmented operations via MCP servers — the patterns that gave GTG-1002 a max score despite an unremarkable technique count. Until ATT&CK adds IDs, track these as your own cross-cutting tags.
  4. Compress vulnerability-to-patch time. When low-skill actors can operate an expert-level harness, the window between a bug becoming discoverable and being exploited shrinks. Treat insecure code as an urgent liability, not a backlog item.
  5. Use AI symmetrically on defense. SOC automation, triage, log analysis and incident response are exactly where the same agentic capability helps blue teams. Anthropic routes dual-use defensive work through a Cyber Verification Program rather than blocking it outright.
  6. Share threat intelligence. TTPs, indicators and risk-scoring methods like ARiES are most useful pooled across organisations — the report itself exists because Anthropic mapped and shared its ban data.

Status

ItemReferenceDateNotes
Year-in-review reportAnthropic News2026-06-03832 accounts, Mar 2025–Mar 2026
LLM ATT&CK Navigator + ARiESred.anthropic.com2026-06-0313,873 actions, 482 techniques, all 14 tactics, ATT&CK V18
Verizon 2026 DBIRVerizon202611 months of the same data contributed
GTG-1002 espionage caseAnthropic News2025-11-13Max ARiES 100; agentic orchestration via Claude Code + MCP
MITRE ATT&CK evolutionAnthropic / MITREongoingDiscussions to add agentic-orchestration categories

The right framing is not “AI writes malware now” — that has been true for a while. It is that the line between a low-risk and a high-risk attacker is no longer technical skill; it is orchestration, and the taxonomy defenders rely on doesn’t yet describe it.

Sources