AGENTS CRITICAL NEW

Tool selection hijacking: forcing an agent to pick the attacker's tool

An NDSS 2026 attack and an April 2026 IBM paper target the same blind spot: the step where an agent chooses which tool to call. Poison the catalog and the agent picks yours, with 70–100% success.

2026-06-21 // 6 min affects: gpt-4o, llama-3.3-70b, function-calling-agents, mcp-tool-using-agents

What is this?

Tool selection hijacking is an attack on the step where an LLM agent decides which tool to call, before it decides how to call it. Most agent-security work targets the content an agent reads back from a tool, or instructions hidden in a tool’s description. This attack targets the routing decision itself: an adversary plants a crafted tool in the agent’s catalog so that, for a chosen task, the agent consistently picks the attacker’s tool instead of the legitimate one. Two recent publications formalize it. ToolHijacker (NDSS 2026, arXiv preprint April 2025) attacks the retrieval-plus-selection pipeline, and Function Hijacking Attacks (IBM Research Europe, Imperial, Trinity College Dublin, arXiv:2604.20994, 22 April 2026) attacks the function-calling decision directly.

How it works

Modern agents pick tools in two stages. A retriever narrows a large tool library down to a top-k shortlist using semantic similarity, then the LLM reads those candidate descriptions and selects one. Both stages are manipulable.

ToolHijacker injects a single malicious tool document into the library. Its name and description are optimized so the document wins retrieval and selection for a target task — even in a “no-box” setting where the attacker cannot see the real tool descriptions, the retriever, the model, or the top-k value. The authors do this by building a shadow copy of the pipeline and splitting the description into two optimized subsequences, one tuned to win retrieval, one tuned to win selection.

Function Hijacking takes a complementary route: it appends adversarial tokens to one function’s metadata so a function-calling model emits the attacker-chosen call. The technique adapts the GCG adversarial-suffix method to the function-calling task. Notably, the paper reports it is “largely agnostic to context semantics” and can be trained into a universal malicious function that hijacks selection across many queries.

The exact optimized strings are withheld here; both are research artifacts, not drop-in payloads.

Why it matters

Selecting the wrong tool is not a cosmetic error — it is a control-flow redirect. If the agent routes a “send payment,” “read file,” or “search database” intent to an attacker-controlled tool, the attacker gains a foothold in the agent’s action space without ever touching the user’s prompt. The measured effectiveness is high: ToolHijacker reports a 96.7% attack success rate with Llama-3.3-70B as the shadow model and GPT-4o as the target on the MetaTool benchmark, plus a 100% retrieval hit rate. Function Hijacking reports 70–100% success across five instructed and reasoning models on the Berkeley Function Calling Leaderboard.

The exposure scales with open tool ecosystems. Anywhere agents pull tools or MCP servers from a shared registry, marketplace, or multi-tenant catalog, a single planted entry can bias routing for every user who triggers the target task.

Defenses

No published defense fully stops these attacks yet, so defense is layered, not singular.

Tested input-side defenses underperform. ToolHijacker’s authors evaluated prevention methods (StruQ, SecAlign) and detection methods (known-answer detection, DataSentinel, perplexity and windowed-perplexity detection); the gradient-free attack still reached 99.6% success under StruQ, and perplexity detection missed 90% of gradient-based malicious documents while keeping false positives under 1%. Treat these as friction, not a fix.

Practical mitigations focus on the catalog and the routing decision. Curate and pin the tool library: allowlist tools by signed identity and provenance rather than resolving freely from open registries. Constrain retrieval to vetted entries and avoid auto-installing third-party MCP servers. Add a selection-integrity check — log which tool was chosen for which intent, alert on a new or low-reputation tool capturing a sensitive task, and require human confirmation before high-impact tool calls (payments, file writes, external network). Apply least privilege so that even a hijacked selection cannot reach dangerous capabilities. Finally, monitor for sudden shifts in which tool serves a recurring task, a direct signal of catalog tampering.

Status

Item	Detail
ToolHijacker	NDSS 2026; arXiv:2504.19793; targets retrieval + selection; no-box
Function Hijacking (FHA)	arXiv:2604.20994, 22 Apr 2026; GCG-style, universal variant
Measured impact	ToolHijacker 96.7% ASR (GPT-4o target); FHA 70–100% ASR (BFCL)
Tested defenses	StruQ, SecAlign, DataSentinel, perplexity — insufficient
Fix status	Open problem; mitigate via provenance, least privilege, selection auditing