Treat Agent Skills Like Supply-Chain Dependencies

Outcome focus: Defined a hardened-by-default skill contract covering version pins, manifest provenance, prompt review, IO tests, least-privilege tools, runtime isolation, observability, rotation, and decommissioning.

The unsafe skill did not look unsafe.

It had a clean name, a clear description, and a useful prompt. It extracted structured fields from invoices and called one OCR endpoint. The team copied it into a repo because it saved time, then updated it twice from the upstream source because the examples looked better.

The third update changed the prompt and widened the tool surface. The skill could now call an arbitrary HTTP URL "for document enrichment." Nobody treated that as a production permission change. The diff looked like documentation. The tests still passed because the tests only covered the happy-path output shape. Two weeks later, the team could not answer the basic incident question: which version of the skill ran on which input, with which signer, under which tool allowlist?

That is the failure mode skills introduce.

Skills are not just prompts. They are prompt text, workflow rules, optional scripts, references, assets, tool permissions, runtime assumptions, and sometimes credentials. They behave like dependencies because they are dependencies. If a repo loads skills from another repo, a marketplace, a shared workspace, or a copied folder, the team has accepted a supply-chain surface whether it named one or not.

The answer is not to ban skills. A focused skill is one of the best ways to keep agents from stuffing every workflow into one giant prompt. The answer is to stop treating skills as informal Markdown and start treating them as code plus configuration: pinned, signed, reviewed, tested, least-privileged, observable, and removable.

This post is the operating contract I would put in a repo before the CI policy gets fancy. For the machine-enforced gate, see Agent Repo Trust Gates: Conftest Policies, SLSA Provenance, and SBOM in GitHub Actions. This layer is simpler: make the skill artifact explicit enough that a policy gate has something real to enforce.

The Risk Model#

A package can break you by changing executable code. A skill can break you by changing behavior.

That behavior change can be subtle. A prompt can ask the agent to include more source text than the business allows. A reference file can teach the wrong API version. A tool allowlist can expand from one internal endpoint to a broad domain pattern. A "temporary" shell helper can survive into production. A skill that started as a read-only summarizer can become a write-capable workflow because one manifest field changed and nobody reviewed it as a permission change.

The useful mental model is small:

A skill becomes production-safe only when source, review, policy, runtime, traceability, and revocation are part of the same contract.

The dangerous gap is between "the skill worked in a demo" and "the skill is authorized to run against production data." Demo success proves almost nothing about provenance, permission scope, or future drift.

The Tradeoff#

The hardened path costs time. Every skill needs a manifest. Every prompt change gets reviewed. Every tool call gets a declared allowlist. Every critical skill gets pinned to a commit instead of a branch. Somebody has to maintain the registry of trusted signers and revoked skills.

The alternative is faster until it is not. If skills update by branch name, "latest," or shared workspace sync, the repo can change behavior without a dependency bump. If the prompt lives only inside SKILL.md, reviewers may miss that the operational contract changed. If the skill has broad network or filesystem access, a prompt-injection path can become a data movement path.

The right trade is visible friction for hidden drift.

Teams already accept this with package managers. Nobody serious deploys a regulated production service with floating package versions and no lockfile because it feels faster. Skills deserve the same baseline. They are smaller than packages, but they sit closer to the model's behavior.

The Drop-In Scaffold#

This is the minimum directory structure I would add to a repo that runs agent skills against non-trivial code, data, or business workflows:

repo-skill-contract.txt

/agents
  /skills
    /<skill-name>
      SKILL.yaml
      PROMPT.md
      CONTRACT.md
      TESTS/
        unit_*.py
        golden/
          cases/*.json
          outputs/*.json
/policies
  SKILLS_POLICY.md
  RISK_MATRIX.md
/registry
  allowed_signers.json
  revoked.json

Each file has one job.

SKILL.yaml is the load-time contract. It carries the id, semver, source commit, signer, hash, scopes, tools, data access, and resource limits.

PROMPT.md is the reviewed prompt artifact. Keep it separate so a prompt diff is visible as a prompt diff, not buried in a manifest or mixed with prose.

CONTRACT.md names the IO schema, service-level expectations, allowed failure modes, max input size, max output size, and examples. If the skill emits JSON, include the JSON Schema here or link to it.

TESTS/ proves the skill still behaves under representative inputs. Unit tests cover tool IO contracts. Golden tests cover output behavior that should not drift without review.

SKILLS_POLICY.md states the rules a human can read. RISK_MATRIX.md turns data sensitivity and tool scope into a load decision. The registry files make trust explicit: which signers are allowed and which skills or signers are blocked even if a manifest otherwise looks valid.

The Manifest#

The supplied checklist is strongest when it becomes a manifest. This is the shape I would start with:

agents/skills/extract-invoice-lines/SKILL.yaml

id: "extract-invoice-lines"
version: "1.4.2"
source: "https://github.com/acme/skills/extract-invoice-lines@9f3e2c1"
signer: "acme-release-key"
sha256: "b5e8..."
 
prompt:
  path: "PROMPT.md"
  sha256: "91ac..."
 
contract:
  path: "CONTRACT.md"
  input_schema: "schemas/input.schema.json"
  output_schema: "schemas/output.schema.json"
 
tools:
  http:
    allow:
      - host: "api.ocr.local"
        paths: ["/v1/extract"]
        methods: ["POST"]
  filesystem:
    read: []
    write: []
  shell:
    allow: []
 
limits:
  max_recursion: 1
  max_tokens: 4096
  time_budget_ms: 8000
  max_tool_calls: 3
 
data_access:
  pii: "no"
  secrets_required: ["OCR_API_TOKEN"]
  retention: "none"
 
observability:
  emit_inputs_fingerprint: true
  emit_outputs_hash: true
  log_tool_envelopes: true
 
sunset:
  review_by: "2026-07-01"
  owner: "platform-ai"

This manifest is intentionally boring. A loader can validate it. A reviewer can scan it. A policy engine can reject it. An incident responder can reconstruct what ran.

The important fields are not the exact names. The important commitments are:

Control	Field	What it prevents
Version pin	`source` includes a commit	Branch drift and silent upstream changes
Integrity	`sha256` and prompt hash	Manifest or prompt substitution
Provenance	`signer`	Anonymous or untrusted skill updates
Least privilege	`tools` allowlists	Broad network, shell, or filesystem access
Runtime limits	`limits`	Recursive agent loops and unbounded cost
Data boundary	`data_access`	Accidental PII, secrets, or retention exposure
Traceability	`observability`	Missing incident evidence
Lifecycle	`sunset`	Abandoned skills staying trusted forever

If a skill cannot fill these fields, it should not run on sensitive data or write-capable workflows yet.

The Fifteen-Minute Checklist#

This is the starter checklist I would put in SKILLS_POLICY.md.

Area	Gate
Source and versioning	Pin every external skill as `source@commit`; block branch names and `latest`. Require semver and a changelog entry for updates. Vendor critical skills when uptime, compliance, or auditability matters.
Integrity and provenance	Verify hashes or signatures before load. Record source URL, commit, signer, hash, and prompt hash at build time.
Review and tests	PR-gate every prompt, manifest, and tool change. Treat prompt diffs as first-class review artifacts. Require unit tests for tool IO and golden tests for representative outputs.
Least privilege	One skill gets one narrow capability. Declare host, path, method, filesystem, shell, and credential scope explicitly. Default-deny everything else.
Runtime isolation	Run skills in a per-skill process, sandbox, or container. Do not share writable state across skills unless a contract names it.
Policy and guardrails	Deny self-modification, arbitrary URL loading, broad shell, and unrestricted filesystem writes by default. Cap recursion and multi-agent depth.
Observability	Emit `{skill, version, signer, inputs_fingerprint, outputs_hash, trace_id}` for every run. Log full tool request and response envelopes with sensitive fields redacted by policy.
Decommission and rotation	Store review dates, owners, key-rotation instructions, and revocation state. Enforce `revoked.json` at load time.

Fifteen minutes is enough to find the largest gap. It is not enough to prove the whole system safe. The goal of the first pass is to identify which skills are already acting like dependencies without dependency controls.

The Risk Matrix#

Do not make every skill fight the same process. A read-only formatting skill that touches public docs does not need the same ceremony as a write-capable invoice extraction skill with credentials.

Use a small matrix:

Data sensitivity	Tool scope	Default decision
Public	No tools	Allow after prompt review and version pin
Public	Read-only repo tools	Allow with manifest, tests, and trace events
Internal	Read-only internal APIs	Require signer, hash, scoped token, and golden tests
Confidential or PII	External egress	Block unless there is a documented exception and redaction path
Confidential or PII	Write-capable tools	Require owner approval, sandbox, audit logs, short-lived credentials, and rollback plan
Secrets	Any model-visible prompt path	Block. Secrets belong in runtime secret injection, not prompt text

This matrix is not a compliance framework. It is a routing table. It tells the reviewer whether the skill can be loaded, needs more controls, or should be rejected.

Prompt Review Is Not Optional#

The prompt is executable behavior by another route.

That makes prompt diffing a security and quality control, not editorial polish. A prompt change can alter what data the agent includes, which source is treated as authoritative, whether the agent asks before taking an action, or how it handles conflicting tool output.

The review should answer four questions:

Prompt question	Why it matters
What input data can this prompt ask the agent to expose?	Prevents accidental exfiltration through summaries or tool calls
What authority does the prompt claim?	Prevents a skill from overriding repo, product, or legal policy
What should the agent refuse?	Prevents "helpful" completion of unsafe requests
What output shape is contractually required?	Prevents downstream parsers from depending on vibes

Golden tests are the cheapest way to catch prompt drift. They do not need to prove the model deterministic. They need to pin the behaviors that matter: JSON shape, required fields, refusal behavior, citation presence, no raw secret echoing, no unexpected tool call.

agents/skills/extract-invoice-lines/TESTS/golden/cases/basic.json

{
  "document_text": "Invoice 1007. Vendor: Northwind. Total: $42.10.",
  "allowed_tools": ["http:api.ocr.local/v1/extract"]
}

agents/skills/extract-invoice-lines/TESTS/golden/outputs/basic.json

{
  "invoice_id": "1007",
  "vendor": "Northwind",
  "total": 42.1,
  "currency": "USD",
  "confidence": "low",
  "notes": ["Synthetic fixture. Verify against source image before payment."]
}

The expected output includes confidence and notes because extraction is not the same as payment authorization. That boundary belongs in the skill contract.

Tool Allowlists Should Be Specific#

"This skill needs HTTP" is not an allowlist.

HTTP to what host? Which path? Which method? With which credential? Can it follow redirects? Can it send document text outside the organization? Can it call arbitrary URLs supplied by the user? Can it retry? Can it upload files?

The default answer should be no.

For most skills, the allowlist should fit on one screen. If it does not, the skill may be too broad. Split it.

Bad shape	Better shape
`tools: ["http", "filesystem", "shell"]`	`http.POST api.ocr.local/v1/extract`
`filesystem: write`	`filesystem.write: ["tmp/extract-invoice-lines/"]`
`shell: true`	No shell. If unavoidable, one named wrapper script with fixed arguments
`secrets: ["*"]`	`secrets_required: ["OCR_API_TOKEN"]`
`egress: true`	`egress.allow: ["api.ocr.local"]`

The "god skill" is the pattern to reject. A skill that can search docs, read repo files, call internal APIs, write tickets, modify code, and run shell is not a skill. It is an unbounded agent profile. The review surface is too large for one artifact.

Runtime Isolation And Traceability#

The manifest is the contract. The runtime has to enforce it.

At load time:

Resolve the skill by pinned commit or vendored path.
Verify signer and hashes.
Check the skill and signer against revoked.json.
Materialize only the declared files into the sandbox.
Inject only the named credentials.
Apply network, filesystem, shell, token, time, and recursion limits.
Emit a start event with skill id, version, signer, source commit, and trace id.

At tool-call time:

Check the call against the allowlist.
Redact sensitive request fields before logging.
Log the request envelope and response envelope.
Record latency, status, retry count, and output hash.
Kill the run if the call violates policy.

At completion time:

Emit output hash, run status, token counts, tool count, and policy violations.
Store trace ids long enough for incident reconstruction.
Keep raw payload retention short and explicit.

The event does not need to store private content to be useful:

skill-run-event.json

{
  "trace_id": "task-018f",
  "skill": "extract-invoice-lines",
  "version": "1.4.2",
  "signer": "acme-release-key",
  "source_commit": "9f3e2c1",
  "inputs_fingerprint": "sha256:6ad1...",
  "outputs_hash": "sha256:f0b9...",
  "tool_calls": 1,
  "policy_violations": 0,
  "status": "ok"
}

For most teams, 30 to 90 days of structured events is enough to reconstruct what happened without creating a permanent content archive. If the workflow is regulated, retention rules should come from the data policy, not from whatever the agent framework defaults to.

Lightweight Team Rituals#

The controls need one human cadence or they will rot.

Add a weekly skill rollup:

weekly-skill-rollup.txt

Added:
- extract-invoice-lines 1.4.2, signer acme-release-key, read-only OCR endpoint
 
Updated:
- summarize-release-risk 0.9.0 -> 0.9.1, prompt wording only, golden tests passed
 
Removed:
- legacy-pr-triage 0.3.0, replaced by repo-local implementation
 
Risk notes:
- No new shell access.
- One new credential scope: OCR_API_TOKEN, expires in 30 days.
- No revoked signer hits.

Add a change window. Production skill updates should happen during business hours, not late Friday or in the middle of an incident unless there is a named break-glass ticket.

Add a break-glass rule. Emergency unsigned loads require a ticket, an owner, a pager, a time limit, and a follow-up removal or proper signed release. Without that, "emergency" becomes the normal path around controls.

Done Definition For A New Skill#

A skill is ready when these are true:

Requirement	Evidence
Semver, signed release, pinned commit	`SKILL.yaml` has `version`, `source@commit`, `signer`, and `sha256`
Prompt diffed and reviewed	`PROMPT.md` is separate and included in PR review
IO contract defined	`CONTRACT.md` links input and output JSON Schema
Unit and golden tests pass	`TESTS/` covers tool IO and representative output fixtures
Tool scope is least-privilege	Manifest allowlists host, path, method, filesystem, shell, and secrets
Runtime is sandboxed	Loader enforces filesystem, network, credential, time, token, and recursion limits
Observability emits	Events include skill, version, signer, input fingerprint, output hash, trace id
Registry state is clean	Signer is in `allowed_signers.json`; skill and signer are absent from `revoked.json`
Rotation exists	Manifest names owner and review date

Do not let "temporary" skills skip this. Temporary skills become permanent when they are useful. If the workflow is not important enough for this minimum contract, it probably should not have production credentials or write access.

What To Do First#

Start with inventory.

List every skill the repo can load. Mark source, version, owner, tool scope, data sensitivity, and whether it is vendored or remote. Any skill without a pinned commit gets fixed first. Any skill with shell, filesystem write, broad HTTP egress, or secrets gets reviewed next. Any skill without an owner gets sunset or assigned.

Then add the manifest schema and registry files. Do not start by writing a large security policy. Start by making the repo tell the truth about what it already runs.

The practical sequence:

Add SKILL.yaml, PROMPT.md, and CONTRACT.md for one high-risk skill.
Add allowed_signers.json and revoked.json.
Add unit and golden tests for the skill's highest-risk behavior.
Enforce default-deny tool allowlists in the loader or wrapper.
Add structured skill-run events.
Move the rules into CI once the file contract is stable.

Agent skill hardening is not a paperwork exercise. It is a way to make behavior changes reviewable. If the repo cannot answer which skill version ran, who signed it, what tools it could call, what data it saw, and how to revoke it, then the skill is already outside the team's operational control.

Pin it. Sign it. Review it. Test it. Sandbox it. Trace it. Revoke it when it no longer deserves trust.