SkillGuard: a permission framework that governs what an agent skill can do at runtime
A June 2026 paper closes the gap between what a skill injects into an agent's context and what it makes the agent do, using manifests, deny-by-default access control and runtime monitoring.
What is this?
SkillGuard: A Permission Framework for Agent Skills (arXiv:2606.03024, posted June 2026) is a defensive proposal for one of the fastest-growing surfaces in agentic AI: skills. A skill is a packaged bundle — instructions, tool definitions, sometimes code — that an agent loads to extend what it can do. The problem the paper attacks is that today’s skill ecosystems mostly rely on trust-based loading and static inspection: you read the file, decide it looks fine, and install it. That leaves a gap between what a skill can inject into the agent’s context and what it can cause the agent to do at runtime.
This is a defensive, systems-side contribution. There are no exploit payloads in it. The question it answers is how to constrain a skill once it is running, not how to abuse one.
How it works
SkillGuard reframes a skill as a permission-bearing executable artifact rather than a trusted text file, and applies a dual-plane governance model that regulates two distinct things at once:
- Context influence — what the skill is allowed to put into, or change about, the agent’s reasoning context.
- Action side effects — what the skill is allowed to make the agent actually do: which tools, files, network destinations, and protected objects it can touch.
Concretely, the framework combines several mechanisms drawn from classic access control and adapted to agents:
- Skill manifests — a declared statement of intent and required capabilities, so a skill’s permissions are explicit and auditable rather than implied.
- Deny-by-default enforcement — anything not declared is refused, the opposite of the “load and trust” status quo.
- Runtime access control — permissions are checked while the skill acts, not only when its files are inspected at install time.
- User-mediated authorization — high-impact capabilities require a human decision rather than being granted silently.
- Capability inference and behavior monitoring — the system infers what a skill actually needs and watches for divergence between declared intent and observed runtime behavior.
The reported numbers give a sense of coverage and cost. SkillGuard’s permission taxonomy covers 99.76% of observed protected objects, and automated manifest generation reaches 91.0% F1 — meaning the framework can largely propose a skill’s permission manifest without a human writing it by hand. In adversarial evaluation, it reduces attack success from 32.37% to 23.02% for contextual injections and from 25.56% to 16.67% for more obvious injections, while keeping benign task utility. Those are partial reductions, not elimination — a point worth keeping in mind.
Why it matters
Skills inherit every weakness of prompt injection and tool misuse, and add a packaging-and-distribution problem on top. The broader literature has already mapped this surface: a survey of agent skills covers their architecture, acquisition and security risks (arXiv:2602.12430), and evaluation work such as SkillVetBench (arXiv:2606.15899) scores open-source skills for security risk before installation. The recurring theme is that a skill is untrusted third-party content that gets unusual privileges — it can rewrite the agent’s instructions and hand it new tools — yet it is usually governed by little more than a glance at the file.
SkillGuard matters because it moves enforcement to where the risk actually lives: runtime, with least privilege. Static inspection can catch known-bad files, but it cannot see what a skill does once the agent is reasoning and acting on live, possibly attacker-influenced, data. Tying a skill to a declared manifest and refusing anything outside it turns “I read the README” into an enforceable boundary. The partial nature of the reported reductions also carries a lesson: a permission layer lowers the blast radius, it does not make a malicious or hijacked skill safe.
Defenses
For teams shipping or installing agent skills, the practical takeaways generalize beyond this one framework:
- Treat skills as untrusted, privileged code. A skill that can edit context and add tools is a higher-privilege object than a normal document. Govern it accordingly, not on trust.
- Adopt deny-by-default for capabilities. Grant a skill only the tools, paths, and network destinations it declares it needs; refuse everything else. Don’t let install-time trust become runtime authority.
- Separate context influence from action side effects. Knowing a skill can shape reasoning is different from knowing it can send data out. Track and gate both planes.
- Require human authorization for high-impact actions. Irreversible or sensitive operations (deletes, transfers, external sends, credential access) should need an explicit human approval, not a silent grant.
- Monitor declared intent versus runtime behavior. A manifest is only useful if you detect divergence from it. Log and alert when a skill reaches for capabilities it never declared.
- Don’t treat a permission layer as a guarantee. SkillGuard reduces injection success but does not zero it out. Pair it with input/output filtering, sandboxing, and the usual lethal-trifecta hygiene (limit private-data access, untrusted content, and external communication in the same loop).
Status
| Item | Detail |
|---|---|
| Paper | ”SkillGuard: A Permission Framework for Agent Skills” |
| arXiv ID | 2606.03024 |
| Posted | June 2026 |
| Type | Defensive permission framework — no exploit payloads |
| Model | Dual-plane governance: context influence + action side effects |
| Mechanisms | Manifests, deny-by-default, runtime access control, user-mediated authorization, capability inference, behavior monitoring |
| Reported results | Taxonomy covers 99.76% of protected objects; manifest generation 91.0% F1; injection success 32.37%→23.02% (contextual) and 25.56%→16.67% (obvious); utility preserved |