Outcome focus: Defined a hardened-by-default skill contract covering version pins, manifest provenance, prompt review, IO tests, least-privilege tools, runtime isolation, observability, rotation, and decommissioning.
agent skillssupply chainagentssecuritydeveloper workflow
The unsafe skill did not look unsafe.
It had a clean name, a clear description, and a useful prompt. It extracted structured fields from invoices and called one OCR endpoint. The team copied it into a repo because it saved time, then updated it twice from the upstream source because the examples looked better.
The third update changed the prompt and widened the tool surface. The skill could now call an arbitrary HTTP URL "for document enrichment." Nobody treated that as a production permission change. The diff looked like documentation. The tests still passed because the tests only covered the happy-path output shape. Two weeks later, the team could not answer the basic incident question: which version of the skill ran on which input, with which signer, under which tool allowlist?
That is the failure mode skills introduce.
Skills are not just prompts. They are prompt text, workflow rules, optional scripts, references, assets, tool permissions, runtime assumptions, and sometimes credentials. They behave like dependencies because they are dependencies. If a repo loads skills from another repo, a marketplace, a shared workspace, or a copied folder, the team has accepted a supply-chain surface whether it named one or not.
The answer is not to ban skills. A focused skill is one of the best ways to keep agents from stuffing every workflow into one giant prompt. The answer is to stop treating skills as informal Markdown and start treating them as code plus configuration: pinned, signed, reviewed, tested, least-privileged, observable, and removable.
This post is the operating contract I would put in a repo before the CI policy gets fancy. For the machine-enforced gate, see Agent Repo Trust Gates: Conftest Policies, SLSA Provenance, and SBOM in GitHub Actions. This layer is simpler: make the skill artifact explicit enough that a policy gate has something real to enforce.
The Risk Model#
A package can break you by changing executable code. A skill can break you by changing behavior.
That behavior change can be subtle. A prompt can ask the agent to include more source text than the business allows. A reference file can teach the wrong API version. A tool allowlist can expand from one internal endpoint to a broad domain pattern. A "temporary" shell helper can survive into production. A skill that started as a read-only summarizer can become a write-capable workflow because one manifest field changed and nobody reviewed it as a permission change.
The useful mental model is small:
The dangerous gap is between "the skill worked in a demo" and "the skill is authorized to run against production data." Demo success proves almost nothing about provenance, permission scope, or future drift.
The Tradeoff#
The hardened path costs time. Every skill needs a manifest. Every prompt change gets reviewed. Every tool call gets a declared allowlist. Every critical skill gets pinned to a commit instead of a branch. Somebody has to maintain the registry of trusted signers and revoked skills.
The alternative is faster until it is not. If skills update by branch name, "latest," or shared workspace sync, the repo can change behavior without a dependency bump. If the prompt lives only inside SKILL.md, reviewers may miss that the operational contract changed. If the skill has broad network or filesystem access, a prompt-injection path can become a data movement path.
The right trade is visible friction for hidden drift.
Teams already accept this with package managers. Nobody serious deploys a regulated production service with floating package versions and no lockfile because it feels faster. Skills deserve the same baseline. They are smaller than packages, but they sit closer to the model's behavior.
The Drop-In Scaffold#
This is the minimum directory structure I would add to a repo that runs agent skills against non-trivial code, data, or business workflows:
/agents
/skills
/<skill-name>
SKILL.yaml
PROMPT.md
CONTRACT.md
TESTS/
unit_*.py
golden/
cases/*.json
outputs/*.json
/policies
SKILLS_POLICY.md
RISK_MATRIX.md
/registry
allowed_signers.json
revoked.jsonEach file has one job.
SKILL.yaml is the load-time contract. It carries the id, semver, source commit, signer, hash, scopes, tools, data access, and resource limits.
PROMPT.md is the reviewed prompt artifact. Keep it separate so a prompt diff is visible as a prompt diff, not buried in a manifest or mixed with prose.
CONTRACT.md names the IO schema, service-level expectations, allowed failure modes, max input size, max output size, and examples. If the skill emits JSON, include the JSON Schema here or link to it.
TESTS/ proves the skill still behaves under representative inputs. Unit tests cover tool IO contracts. Golden tests cover output behavior that should not drift without review.
SKILLS_POLICY.md states the rules a human can read. RISK_MATRIX.md turns data sensitivity and tool scope into a load decision. The registry files make trust explicit: which signers are allowed and which skills or signers are blocked even if a manifest otherwise looks valid.
The Manifest#
The supplied checklist is strongest when it becomes a manifest. This is the shape I would start with:
id: "extract-invoice-lines"
version: "1.4.2"
source: "https://github.com/acme/skills/extract-invoice-lines@9f3e2c1"
signer: "acme-release-key"
sha256: "b5e8..."
prompt:
path: "PROMPT.md"
sha256: "91ac..."
contract:
path: "CONTRACT.md"
input_schema: "schemas/input.schema.json"
output_schema: "schemas/output.schema.json"
tools:
http:
allow:
- host: "api.ocr.local"
paths: ["/v1/extract"]
methods: ["POST"]
filesystem:
read: []
write: []
shell:
allow: []
limits:
max_recursion: 1
max_tokens: 4096
time_budget_ms: 8000
max_tool_calls: 3
data_access:
pii: "no"
secrets_required: ["OCR_API_TOKEN"]
retention: "none"
observability:
emit_inputs_fingerprint: true
emit_outputs_hash: true
log_tool_envelopes: true
sunset:
review_by: "2026-07-01"
owner: "platform-ai"This manifest is intentionally boring. A loader can validate it. A reviewer can scan it. A policy engine can reject it. An incident responder can reconstruct what ran.
The important fields are not the exact names. The important commitments are:
| Control | Field | What it prevents |
|---|---|---|
| Version pin | source includes a commit | Branch drift and silent upstream changes |
| Integrity | sha256 and prompt hash | Manifest or prompt substitution |
| Provenance | signer | Anonymous or untrusted skill updates |
| Least privilege | tools allowlists | Broad network, shell, or filesystem access |
| Runtime limits | limits | Recursive agent loops and unbounded cost |
| Data boundary | data_access | Accidental PII, secrets, or retention exposure |
| Traceability | observability | Missing incident evidence |
| Lifecycle | sunset | Abandoned skills staying trusted forever |
If a skill cannot fill these fields, it should not run on sensitive data or write-capable workflows yet.
The Fifteen-Minute Checklist#
This is the starter checklist I would put in SKILLS_POLICY.md.
| Area | Gate |
|---|---|
| Source and versioning | Pin every external skill as source@commit; block branch names and latest. Require semver and a changelog entry for updates. Vendor critical skills when uptime, compliance, or auditability matters. |
| Integrity and provenance | Verify hashes or signatures before load. Record source URL, commit, signer, hash, and prompt hash at build time. |
| Review and tests | PR-gate every prompt, manifest, and tool change. Treat prompt diffs as first-class review artifacts. Require unit tests for tool IO and golden tests for representative outputs. |
| Least privilege | One skill gets one narrow capability. Declare host, path, method, filesystem, shell, and credential scope explicitly. Default-deny everything else. |
| Runtime isolation | Run skills in a per-skill process, sandbox, or container. Do not share writable state across skills unless a contract names it. |
| Policy and guardrails | Deny self-modification, arbitrary URL loading, broad shell, and unrestricted filesystem writes by default. Cap recursion and multi-agent depth. |
| Observability | Emit {skill, version, signer, inputs_fingerprint, outputs_hash, trace_id} for every run. Log full tool request and response envelopes with sensitive fields redacted by policy. |
| Decommission and rotation | Store review dates, owners, key-rotation instructions, and revocation state. Enforce revoked.json at load time. |
Fifteen minutes is enough to find the largest gap. It is not enough to prove the whole system safe. The goal of the first pass is to identify which skills are already acting like dependencies without dependency controls.
The Risk Matrix#
Do not make every skill fight the same process. A read-only formatting skill that touches public docs does not need the same ceremony as a write-capable invoice extraction skill with credentials.
Use a small matrix:
| Data sensitivity | Tool scope | Default decision |
|---|---|---|
| Public | No tools | Allow after prompt review and version pin |
| Public | Read-only repo tools | Allow with manifest, tests, and trace events |
| Internal | Read-only internal APIs | Require signer, hash, scoped token, and golden tests |
| Confidential or PII | External egress | Block unless there is a documented exception and redaction path |
| Confidential or PII | Write-capable tools | Require owner approval, sandbox, audit logs, short-lived credentials, and rollback plan |
| Secrets | Any model-visible prompt path | Block. Secrets belong in runtime secret injection, not prompt text |
This matrix is not a compliance framework. It is a routing table. It tells the reviewer whether the skill can be loaded, needs more controls, or should be rejected.
Prompt Review Is Not Optional#
The prompt is executable behavior by another route.
That makes prompt diffing a security and quality control, not editorial polish. A prompt change can alter what data the agent includes, which source is treated as authoritative, whether the agent asks before taking an action, or how it handles conflicting tool output.
The review should answer four questions:
| Prompt question | Why it matters |
|---|---|
| What input data can this prompt ask the agent to expose? | Prevents accidental exfiltration through summaries or tool calls |
| What authority does the prompt claim? | Prevents a skill from overriding repo, product, or legal policy |
| What should the agent refuse? | Prevents "helpful" completion of unsafe requests |
| What output shape is contractually required? | Prevents downstream parsers from depending on vibes |
Golden tests are the cheapest way to catch prompt drift. They do not need to prove the model deterministic. They need to pin the behaviors that matter: JSON shape, required fields, refusal behavior, citation presence, no raw secret echoing, no unexpected tool call.
{
"document_text": "Invoice 1007. Vendor: Northwind. Total: $42.10.",
"allowed_tools": ["http:api.ocr.local/v1/extract"]
}{
"invoice_id": "1007",
"vendor": "Northwind",
"total": 42.1,
"currency": "USD",
"confidence": "low",
"notes": ["Synthetic fixture. Verify against source image before payment."]
}The expected output includes confidence and notes because extraction is not the same as payment authorization. That boundary belongs in the skill contract.
Tool Allowlists Should Be Specific#
"This skill needs HTTP" is not an allowlist.
HTTP to what host? Which path? Which method? With which credential? Can it follow redirects? Can it send document text outside the organization? Can it call arbitrary URLs supplied by the user? Can it retry? Can it upload files?
The default answer should be no.
For most skills, the allowlist should fit on one screen. If it does not, the skill may be too broad. Split it.
| Bad shape | Better shape |
|---|---|
tools: ["http", "filesystem", "shell"] | http.POST api.ocr.local/v1/extract |
filesystem: write | filesystem.write: ["tmp/extract-invoice-lines/"] |
shell: true | No shell. If unavoidable, one named wrapper script with fixed arguments |
secrets: ["*"] | secrets_required: ["OCR_API_TOKEN"] |
egress: true | egress.allow: ["api.ocr.local"] |
The "god skill" is the pattern to reject. A skill that can search docs, read repo files, call internal APIs, write tickets, modify code, and run shell is not a skill. It is an unbounded agent profile. The review surface is too large for one artifact.
Runtime Isolation And Traceability#
The manifest is the contract. The runtime has to enforce it.
At load time:
- Resolve the skill by pinned commit or vendored path.
- Verify signer and hashes.
- Check the skill and signer against
revoked.json. - Materialize only the declared files into the sandbox.
- Inject only the named credentials.
- Apply network, filesystem, shell, token, time, and recursion limits.
- Emit a start event with skill id, version, signer, source commit, and trace id.
At tool-call time:
- Check the call against the allowlist.
- Redact sensitive request fields before logging.
- Log the request envelope and response envelope.
- Record latency, status, retry count, and output hash.
- Kill the run if the call violates policy.
At completion time:
- Emit output hash, run status, token counts, tool count, and policy violations.
- Store trace ids long enough for incident reconstruction.
- Keep raw payload retention short and explicit.
The event does not need to store private content to be useful:
{
"trace_id": "task-018f",
"skill": "extract-invoice-lines",
"version": "1.4.2",
"signer": "acme-release-key",
"source_commit": "9f3e2c1",
"inputs_fingerprint": "sha256:6ad1...",
"outputs_hash": "sha256:f0b9...",
"tool_calls": 1,
"policy_violations": 0,
"status": "ok"
}For most teams, 30 to 90 days of structured events is enough to reconstruct what happened without creating a permanent content archive. If the workflow is regulated, retention rules should come from the data policy, not from whatever the agent framework defaults to.
Lightweight Team Rituals#
The controls need one human cadence or they will rot.
Add a weekly skill rollup:
Added:
- extract-invoice-lines 1.4.2, signer acme-release-key, read-only OCR endpoint
Updated:
- summarize-release-risk 0.9.0 -> 0.9.1, prompt wording only, golden tests passed
Removed:
- legacy-pr-triage 0.3.0, replaced by repo-local implementation
Risk notes:
- No new shell access.
- One new credential scope: OCR_API_TOKEN, expires in 30 days.
- No revoked signer hits.Add a change window. Production skill updates should happen during business hours, not late Friday or in the middle of an incident unless there is a named break-glass ticket.
Add a break-glass rule. Emergency unsigned loads require a ticket, an owner, a pager, a time limit, and a follow-up removal or proper signed release. Without that, "emergency" becomes the normal path around controls.
Done Definition For A New Skill#
A skill is ready when these are true:
| Requirement | Evidence |
|---|---|
| Semver, signed release, pinned commit | SKILL.yaml has version, source@commit, signer, and sha256 |
| Prompt diffed and reviewed | PROMPT.md is separate and included in PR review |
| IO contract defined | CONTRACT.md links input and output JSON Schema |
| Unit and golden tests pass | TESTS/ covers tool IO and representative output fixtures |
| Tool scope is least-privilege | Manifest allowlists host, path, method, filesystem, shell, and secrets |
| Runtime is sandboxed | Loader enforces filesystem, network, credential, time, token, and recursion limits |
| Observability emits | Events include skill, version, signer, input fingerprint, output hash, trace id |
| Registry state is clean | Signer is in allowed_signers.json; skill and signer are absent from revoked.json |
| Rotation exists | Manifest names owner and review date |
Do not let "temporary" skills skip this. Temporary skills become permanent when they are useful. If the workflow is not important enough for this minimum contract, it probably should not have production credentials or write access.
What To Do First#
Start with inventory.
List every skill the repo can load. Mark source, version, owner, tool scope, data sensitivity, and whether it is vendored or remote. Any skill without a pinned commit gets fixed first. Any skill with shell, filesystem write, broad HTTP egress, or secrets gets reviewed next. Any skill without an owner gets sunset or assigned.
Then add the manifest schema and registry files. Do not start by writing a large security policy. Start by making the repo tell the truth about what it already runs.
The practical sequence:
- Add
SKILL.yaml,PROMPT.md, andCONTRACT.mdfor one high-risk skill. - Add
allowed_signers.jsonandrevoked.json. - Add unit and golden tests for the skill's highest-risk behavior.
- Enforce default-deny tool allowlists in the loader or wrapper.
- Add structured skill-run events.
- Move the rules into CI once the file contract is stable.
Agent skill hardening is not a paperwork exercise. It is a way to make behavior changes reviewable. If the repo cannot answer which skill version ran, who signed it, what tools it could call, what data it saw, and how to revoke it, then the skill is already outside the team's operational control.
Pin it. Sign it. Review it. Test it. Sandbox it. Trace it. Revoke it when it no longer deserves trust.