Hieu Ho

A Developer's Guide to Agent Skills

What agent skills are, how the open format works, where to save them, how to author and test, and how to publish across clients and skills.sh.

Agent Skills are modular capabilities that extend an agent with domain-specific expertise. Rather than relying on the model's general training alone, each skill packages the instructions, metadata, and optional resources (scripts, templates, reference files) that an agent needs for a specific class of work, and loads them only when the task calls for it.

Concretely, a skill is a directory built around a required SKILL.md file. That file opens with YAML frontmatter carrying at least a name and description, then continues with the markdown instructions the agent will follow. Heavier material can live in optional scripts/, references/, and assets/ directories. The full format is defined in the agentskills.io specification.

Because the layout is an open standard, the same skill works across OpenAI Codex, Claude Code, Claude API, Cursor, VS Code, GitHub Copilot, and 40+ other clients without rewriting it for each tool.

Skills compared to other customization

Agents accept guidance in several forms, and skills sit in a specific layer of that stack. They are more durable than a one-off prompt but more targeted than always-on project rules, which makes them the right place for workflows you want to reuse without paying a permanent context penalty.

Approach Lifetime Context cost Best for
Prompt One conversation Full text every turn Ad hoc questions, exploration
Rules / AGENTS.md Always active Loaded into every session Project-wide conventions
Skill On disk, versioned Metadata at startup, body on trigger Repeatable workflows, domain procedures
Plugin (Codex) Installable package Bundles skills plus apps Distribution beyond a single repo

In Codex, that separation becomes explicit: skills are the authoring format, while plugins are the distribution unit. You define the workflow as a skill first, then package it as a plugin when you need installable distribution beyond a single repository.

When to use skills

Create a skill when the same instructions keep showing up across sessions, whether as a checklist, a deployment sequence, a review rubric, or a procedure that has outgrown a single line in project config.

A skill gives you what a prompt cannot: persistence on disk, version control in git, review in a PR, and reuse across sessions. At the same time, it avoids the always-on cost of rules or AGENTS.md, because the skill body loads only on activation and long reference material stays off-context until it is actually needed.

Benefits:

  • Specialize agents for domain-specific tasks without fine tuning
  • Encode workflows once instead of repeating instructions
  • Compose capabilities by combining focused skills
  • Reuse the same on-disk format across compatible clients

Suitable use cases:

  • Multi-step workflows with validation gates (deploy, migrate, release)
  • Domain procedures the agent cannot infer (internal APIs, naming conventions)
  • Consistent output formats (report templates, commit message style)
  • Operations where scripts improve reliability (parsing, validation, batch transforms)

Unsuitable use cases:

  • One-time questions with no reusable pattern
  • General knowledge the agent already has
  • Project-wide conventions that should apply to every task (use rules instead)

How skills work

Progressive disclosure

Skills manage context through progressive disclosure, loading detail in stages instead of upfront. At startup, agents read only each skill's name and description. When a skill activates, they load the full SKILL.md body. Bundled files and scripts enter context only when the instructions reference them.

Level When loaded Token cost Content
Metadata At startup ~100 tokens per skill name and description from frontmatter
Instructions On trigger Under 5k tokens SKILL.md body
Resources As needed Free until referenced references/, scripts/, assets/

Session flow

During a session, compatible clients move through a predictable sequence. They discover skills by scanning directories and reading frontmatter, then match the user's request against descriptions. Once a skill activates, its full body enters context, the agent executes the instructions (pulling in references or scripts as needed), and the activated content remains available across subsequent turns.

  1. Discovery. Scan skill directories and read frontmatter from each SKILL.md.
  2. Matching. Compare the user request against skill descriptions.
  3. Activation. Load the full SKILL.md body into context.
  4. Execution. Follow instructions, reading reference files or running scripts as needed.
  5. Retention. Keep activated skill content in context across subsequent turns.

Client lifecycle

Under the hood, client implementations follow a separate lifecycle defined in the client implementation guide: discover, parse, disclose, activate, and manage skill context over time. The session flow above is what that lifecycle looks like from the user's side.

Codex catalog budget

Codex adds one more constraint for implicit matching. It includes an initial skills list in context so the model can choose the right skill, but caps that list at 2% of the context window, or 8,000 characters when the window size is unknown, so the catalog does not crowd out the rest of the prompt. When many skills are installed, Codex shortens descriptions first. Once a skill is selected, it still loads the full SKILL.md. See Codex skills.

How agents use skills

Agents expose two invocation modes, and most production setups use both. In explicit invocation, you reference the skill directly in the prompt or call it through /skills and $skill-name in CLI and IDE tools. In implicit invocation, the agent selects a skill on its own when the task matches the skill description.

Implicit matching rises and falls on the description field, so keep it concise, scoped, and rich in trigger terms. Front-load the primary use case, because shortened descriptions still need enough signal to match correctly. See optimizing descriptions for a structured testing workflow.

Claude Code adds another layer of control through frontmatter. Set disable-model-invocation to true when you want a workflow to run only on manual invocation. See Claude Code skills for the full set of options.

Skill structure

Every skill is a directory centered on SKILL.md, with optional supporting material around it.

Text
my-skill/
├── SKILL.md          # Required: instructions + metadata
├── scripts/          # Optional: executable code
├── references/       # Optional: documentation
└── assets/           # Optional: templates, resources

The frontmatter must include name and description:

YAML
---
name: my-skill
description: What this skill does and when to use it
---
Field Requirements
name Max 64 chars. Lowercase letters, numbers, hyphens. Must match directory name.
description Max 1024 chars. Non empty. Third person. States what the skill does and when to trigger it.

Run validation before you commit:

Bash
skills-ref validate ./my-skill

Keep SKILL.md under 500 lines and 5,000 tokens, and move anything longer into references/. That limit matters because activated skill content persists in context across turns, so every extra line becomes a recurring token cost.

For naming, prefer gerund form (processing-pdfs, analyzing-spreadsheets) or consistent noun phrases across your skill library. Avoid vague names like helper, utils, or documents, and keep patterns consistent so a growing library stays searchable. See Anthropic naming guidance.

Codex also supports optional agents/openai.yaml for UI metadata. See Codex docs and the agentskills.io specification.

Where to save skills

Where you store a skill determines who can use it. The cross-client convention is .agents/skills/, though individual clients add their own paths on top of that baseline.

Scope Path Applies to
Project repo/.agents/skills/ Current repository
User ~/.agents/skills/ All projects for the user
Claude Code (personal) ~/.claude/skills/ All projects for the user
Claude Code (project) .claude/skills/ Current project only
Codex (admin) /etc/codex/skills/ System-wide defaults

Codex walks upward from the current working directory to the repository root and collects every .agents/skills directory along the way. If the same name appears at multiple scopes, both versions can show up in selectors rather than merging silently. See Codex skill scopes.

Claude Code goes further in monorepos by discovering skills from parent and nested .claude/skills/ directories, which lets individual packages ship skills that apply only to their subtree. Per-client setup details are in the client showcase.

To disable a skill without deleting it in Codex:

Toml
[[skills.config]]
path = "/path/to/skill/SKILL.md"
enabled = false

Claude API and claude.ai

The Claude stack supports two skill models side by side. Filesystem skills work the way this guide describes, while managed skills upload through Anthropic's hosted surfaces. Anthropic ships built-in document skills (pptx, xlsx, docx, pdf), and custom skills can upload through the Skills API for workspace-wide API use or as zip files in claude.ai settings.

API usage requires beta headers code-execution-2025-08-25, skills-2025-10-02, and files-api-2025-04-14, along with the code execution tool, and each request can include at most 8 skills.

Create a skill

There is no single right entry point. Pick the method that matches how you already know the workflow.

Method Client Description
Record & Replay Codex Record a workflow and generate a draft skill. See Record & Replay.
Skill creator Codex Run $skill-creator to define scope, triggers, and whether to include scripts.
Manual authoring Any Create a directory with SKILL.md. Scaffold with npx skills init or follow the quickstart.
Extract from execution Any Complete a real task with an agent, capture corrections, and distill a minimal SKILL.md.
Synthesize from artifacts Any Derive a skill from runbooks, style guides, API specs, or incident reports.

The most reliable path is evaluation-driven: run representative tasks without a skill first, document where the agent fails or lacks context, then write only enough instruction to close those gaps. From there, the refinement loop is the same regardless of how you started:

  1. Execute the target task with an agent and record corrections.
  2. Draft minimal SKILL.md from the reusable pattern.
  3. Validate with skills-ref validate.
  4. Test triggering per optimizing descriptions.
  5. Test output quality per evaluating skills.

For deeper authoring guidance, see agentskills.io best practices and Anthropic best practices.

Authoring best practices

Effective skills are concise, well-structured, and tested against real tasks. The sections below collect the authoring decisions that matter most in production, drawing from agentskills.io best practices and Anthropic best practices.

Write concisely

The context window is shared infrastructure. Your skill competes with the system prompt, conversation history, other skills' metadata, and the user's actual request. Metadata is cheap at startup, but once SKILL.md activates, every token in the body competes with everything else in the window.

The default assumption should be that the agent is already capable. Add only what it would get wrong without your skill: project conventions, non-obvious edge cases, tool choices. Skip explanations of concepts the model already understands, like what a PDF is or how HTTP works. For each paragraph, ask whether it justifies its token cost. If not, cut it.

Scope one coherent unit

Think of a skill the way you would think of a function. It should encapsulate one composable unit of work. Scope it too narrowly and a single task forces multiple skills to load. Scope it too broadly and implicit triggering becomes unreliable.

Set degrees of freedom

Not every instruction needs the same level of prescription. Match specificity to how fragile and variable the task is.

High freedom fits open-ended work where context should drive the approach. Code review is a common example: list what to inspect, but let the agent decide how to inspect it.

Medium freedom fits tasks with a preferred pattern but acceptable variation. Pseudocode, parameterized scripts, or templates with optional fields work well here.

Low freedom fits fragile, order-dependent operations where deviation causes failure. Database migrations and production deploys belong here: name exact commands and tell the agent not to modify them.

A useful mental model is pathfinding. A narrow bridge with cliffs on both sides demands exact guardrails. An open field with no hazards only needs direction. Most skills cross both terrains, so calibrate each section independently rather than picking one style for the whole file.

Write effective descriptions

The description is the skill's discovery interface. Agents use it to choose among dozens or hundreds of installed skills, so it must carry enough signal for selection while SKILL.md carries the implementation detail.

Always write in third person. First or second person descriptions break discovery because they get injected into system context inconsistently. State what the skill does and when to use it, and include trigger terms users actually say.

YAML
# Avoid (vague, wrong point of view)
description: I can help you process Excel files

# Avoid (too generic)
description: Helps with documents

# Recommended
description: >
  Audit Excel workbooks for formula errors, broken references, and inconsistent
  units. Use when reviewing .xlsx files before financial reporting or handoff.

Organize with progressive disclosure

Treat SKILL.md as an overview that routes the agent to detail on demand, similar to an onboarding guide's table of contents. As skills grow, three organization patterns cover most cases.

Guide with references. Keep quick-start instructions in SKILL.md and link to separate files for advanced topics, API reference, and examples. The agent loads those files only when the task requires them.

Domain-specific files. When a skill spans multiple domains, split references by domain so the agent loads finance.md instead of the entire schema library when the question is about revenue.

Conditional details. Put the default path in SKILL.md and link to advanced files only for specialized branches, like tracked changes or low-level format details.

Two structural rules keep this working in practice. Keep references one level deep from SKILL.md, because nested reference chains often get partially read and leave the agent with incomplete context. For reference files longer than 100 lines, add a table of contents at the top so partial reads still reveal the full scope.

Instruction patterns

Different tasks call for different scaffolding. These patterns show up repeatedly in production skills:

Pattern Use when
Gotchas Environment-specific facts that contradict default assumptions
Output templates Output must match a fixed or default structure
Input/output examples Style and format are easier to show than describe
Checklists Multi-step workflows with dependencies or validation gates
Validation loops The agent must verify output before proceeding
Plan-validate-execute Batch or destructive operations need a reviewed plan first
Conditional workflow The task branches based on type, state, or input
Defaults, not menus Multiple tools exist but one should be preferred

Templates can be strict (exact structure required) or flexible (sensible default, adapt as needed). Match strictness to how much variation the output can tolerate. Input/output examples work the same way prompting does: show the agent the shape you want rather than only describing it.

When a live session produces a correction, that correction usually belongs in a gotchas section. Over time, those additions become the main way production skills improve.

Content discipline

Choose one term per concept and use it consistently throughout the skill. Mixing endpoint, URL, route, and path for the same thing makes instructions harder to follow.

Avoid time-sensitive instructions that will go stale. Instead of date-bound conditionals, document the current method in the main body and move superseded approaches to a clearly labeled legacy section.

Reference vs task content

Claude Code makes a useful distinction between two content types, and the choice affects how the skill gets invoked.

Reference content supplies knowledge the agent applies to ongoing work: conventions, patterns, style guides. It tends to load inline when relevant.

Task content supplies step-by-step instructions for a specific action: deploy, commit, generate a report. These are usually invoked explicitly via /skill-name rather than left to implicit matching.

Scripts and bundled resources

Start with instructions and reach for scripts only when you need deterministic behavior. When you do bundle scripts, pin versions, document --help, and emit structured output to stdout. On clients that execute scripts via shell, the script source often stays off-context and only the output returns, which makes scripts efficient for validation and parsing. See using scripts.

The same principle applies to reference files. Move large material into references/, but tell the agent exactly when to load each file. A conditional reference beats a generic pointer to a references/ directory every time.

Test across models and clients

A skill is an addition to a model, not a replacement for one. What works on a reasoning-heavy model may over-explain for a fast economical one, and what is enough for a capable model may under-guide a smaller one. If you plan to use a skill across multiple models or clients, test it on each target and adjust instruction density accordingly.

Validate before shipping

No first draft ships ready. Treat skill authoring as a loop, and run three checks before you distribute.

Triggering. Run prompts that should and should not activate the skill, then track false positives and false negatives. Aim for a trigger rate above 0.5 across roughly 20 prompts, including near-miss negatives, with three runs per prompt. See optimizing descriptions.

Output quality. Run real tasks with the skill active and read execution traces, not just final outputs. Wasted steps in a trace usually mean the instructions are too vague, too broad, or presenting too many options without a default.

Corrections. Route recurring corrections into gotchas or instruction updates so the skill learns from production use.

For structured evaluation with test cases, assertions, and grading, see evaluating skills.

Distribute skills

Once a skill works locally, you have several ways to put it in other people's hands.

Mechanism Use case
Git (.agents/skills/) Team workflows, reviewable in PRs
skills.sh Public discovery and install via npx skills add
Claude Skills API Workspace-wide distribution on Anthropic's API
Codex plugins Installable distribution beyond a single repo

To publish on skills.sh, push a public GitHub repository in the standard layout and let users install with:

Bash
npx skills add owner/repo

Install ranking uses anonymous telemetry. See skills.sh docs.

Reference implementations:

Security considerations

A skill is effectively new software for your agent, so trust matters as much as capability. Install skills only from sources you trust, because a malicious skill can use instructions and executable code to run harmful commands or exfiltrate data.

Before you install a third-party skill, audit SKILL.md, scripts, and references with the same care you would apply to a dependency. Look for network calls, file access patterns, and operations that diverge from the stated purpose. Treat installation as deploying software with access to your codebase. See Anthropic security guidance.

Limitations and constraints

Skills do not travel automatically across surfaces. A filesystem skill sitting in .agents/skills/ will not appear in claude.ai or the Claude API until you upload it separately, and the reverse is true as well.

Surface Sharing scope
Coding agents (Codex, Cursor, etc.) Project or user via filesystem
Claude API Workspace wide
claude.ai Per user
Claude Code Personal or project based

Runtime environments differ in the same way. Claude API skills run without outbound network access and with pre-installed packages only, while Codex and most coding agents inherit the host machine's network and tooling. Author each skill for the surface where it will actually run.

Next steps

A Developer's Guide to Agent Skills