When you're building an AI media assistant, the obvious move is one big system prompt: 'You are an expert producer skilled in journalism, video, podcasting, PR.' Load it with everything and let the model figure out what the user needs. Developer urgrue went the opposite direction — a library of 394 separate skills, each narrowly scoped to do exactly one thing. A skill for ledes. A skill for show notes. A skill that strips AI tells out of a draft. No multitasking allowed.
The Averaging Problem
A system prompt trying to be good at everything ends up good at nothing in particular. When your instructions cover forty different formats and jobs simultaneously, the model's attention gets diluted across all of them. You get competent-but-generic output because 'writing ledes' has to share weight with 'podcast structure' and 'FOIA letters.' The result is averaged into mush — exactly what you don't want when your audience can spot generic from a mile away. Narrow skills solve this by giving one format the full spotlight. A single lede-writing skill carries everything it needs: what a lede actually is, the three styles worth trying, the trap of burying the news, conventions for different outlet types. Nothing competes for that attention. The output is sharper because the instruction is sharper.
Testability Changes Everything
Here's where narrow wins decisively over the 'one big prompt' approach — each skill is testable. Because a lede-writing skill has one job, you can write assertions: did it produce a one-sentence opening? Did it refuse to invent a quote with no source? You can run that across dozens of inputs and get a real pass rate. You cannot meaningfully eval 'be a great media assistant.' There's no assertion for it. By splitting the work into narrow skills, every single one becomes measurable — and according to the GitHub repo, every skill in this library is scored against a seven-dimension rubric (with a hard floor on whether the output reads human) before shipping as stable.
Composition Beats Complexity
The objection writes itself: 394 skills sounds like more work than one prompt. In practice, it works differently. You don't load 394 — you reach for the one the task needs. And when tasks chain together, that's where narrow gets powerful: story-angle-finder → reportage-structure → lede-writer → fact-check → libel-check becomes a real pipeline, each step's output feeding the next, each step independently verified. The trade-off is honest: discovery matters more with narrow skills. You have to find the right one for the job. That's why the library includes role-based guides and a one-command plugin install that lets the model auto-select the appropriate skill. The bet is that 'sharp and findable' beats 'broad and average.'
Key Takeaways
- One big prompt dilutes instruction quality across all tasks, producing generic output
- Narrow skills are individually testable with real assertions — you can measure pass rates
- Skills compose into pipelines: each step's verified output feeds the next step
- Discovery is the trade-off; role-based guides and auto-selection help users find what they need
The Bottom Line
If you're still jamming everything into one system prompt, you're shipping mediocrity by design. Narrow scope isn't a limitation — it's testability, composability, and sharper output all at once. For builders who care whether the AI's output is actually shippable, this approach is worth studying.