Perplexity has published its internal guide for designing, refining, and maintaining Agent Skills—the modular know-how units powering products like Perplexity Computer. The company is releasing the documentation publicly so 'our discoveries and learnings can benefit the broader community,' according to the research team behind it. This isn't fluffy thought leadership; it's hard-won operational knowledge from engineers who have shipped production agent systems.

What Makes Skill Writing Completely Different From Code

The guide opens with a blunt reality check: most software engineering intuitions become antipatterns when building Skills. The researchers illustrate this by contrasting PEP 20 (the Zen of Python) against what they call the 'Zen of Skills.' Traditional axioms like 'simple is better than complex' and 'explicit is better than implicit' actively mislead Skill authors. In their framework, complexity is a feature—Skills are folders with hierarchy, not single files. Activation relies on implicit pattern matching rather than explicit declarations. Dense information beats sparse because context costs tokens. Perhaps most counterintuitively: if something is easy to explain in plain language, the model probably already knows it and you should delete that instruction entirely. The guide states this plainly: 'If it's easy to explain, it may be a good idea' becomes 'If it's easy to explain, the model already knows it. Delete it.' This inverts how most developers approach documentation.

Skills Are Four Things at Once

According to Perplexity's framework, a Skill is simultaneously a directory structure, a format with specific frontmatter requirements, an invocable component with runtime costs, and a progressive disclosure system. The SKILL.md file sits at the center, surrounded by scripts/, references/, assets/, and config.json spokes. This hub-and-spoke pattern keeps individual Skills focused while allowing complex domain knowledge to be organized hierarchically. The hierarchy point proved critical during tax season when their team was organizing U.S. income tax capabilities. Early tests showed that presenting a model with all 1,945 sections of the Internal Revenue Code in a single folder performed worse than loading no Skill at all. By subdividing into logical topical areas with multiple nesting levels, they achieved dramatically better results—though this required building quick reference guides and custom search utilities to help the agent navigate without excessive indirection.

The Three-Tier Context Cost Model

Perplexity's Agents team thinks about Skill invocation in three distinct budget tiers. First, there's the index cost: roughly 100 tokens per non-hidden Skill for name and description metadata that gets injected into every session for every user—always paid. Second is the load cost: up to 5,000 tokens for the full SKILL.md body that applies until compaction boundary. Third is runtime cost: scripts, references, subskills, and conditional content loaded only when actually needed. The team emphasizes that Skills with unnecessary verbosity don't just waste their own context—they degrade every other Skill in a conversation and overall agentic capabilities. They quote Pascal (in French, naturally): 'I have only made this letter longer because I have not had the time to make it shorter.' Every sentence must earn its place.

The Step-by-Step Build Process

The guide outlines a concrete workflow: start by writing evals before touching Skill code. Source test cases from real user queries, known failure patterns, and 'neighbor confusion' where similar domains might route incorrectly. Negative examples—cases where the agent should NOT load or behave in certain ways—are described as 'extremely powerful' and often more valuable than positive ones. Next comes the description—the hardest line to write because it's not documentation but a routing trigger. Perplexity's team stresses that descriptions describe when to invoke the Skill, not what it does. For monitoring pull requests, don't document functionality; capture how engineers express frustration: 'babysit,' 'watch CI,' 'make sure this lands.' Target 50 words or fewer. The body writing phase requires abandoning documentation instincts entirely. Don't list commands—the model already knows git operations. Instead, describe desired outcomes with flexibility for different approaches: 'Cherry-pick the commit onto a clean branch' beats step-by-step cherry-pick instructions that break when anything deviates from plan.

Skills Are Append-Mostly

Maintenance follows what Perplexity calls the 'gotchas flywheel.' When an agent fails at something, add a gotcha. When it loads off-target, tighten description and add negative evals. When it doesn't load when needed, add keywords and positive examples. The guide explicitly warns against changing descriptions after shipping without supporting eval changes—this is the routing logic, and regressions ripple across every other Skill.

Key Takeaways

  • Writing good Skills inverts most software engineering principles—verbosity costs tokens for everyone, always
  • Three-tier context model: index (~100 tokens/skill), load (~5K tokens), runtime (unbounded but conditional)
  • Descriptions are routing triggers, not documentation—capture intent from frustrated user language
  • Start with evals before writing code; negative examples often outweigh positive ones in value
  • Skills grow append-mostly—the gotchas section accrues the most long-term value

The Bottom Line

This guide is required reading if you're building anything that relies on AI agents making decisions about when to use specialized knowledge. The research team isn't theorizing—they're describing what actually shipped in production systems used by real users. The counterintuitive principles aren't academic; they're the result of shipping code that had to work.