I Tested Hermes Agent's Self-Improvement Claim. It Wrote Its Own Skill Four Minutes Before I Asked.

Most of us have seen the marketing copy around agentic frameworks: "self-improving," "learns from experience," "gets better over time." It's the kind of language that either signals something genuinely new or sounds great in a README and does nothing. I went in skeptical, installed Nous Research's Hermes Agent without ever having run an agentic framework before, and gave it work designed to catch it lying about its headline feature.

What "Self-Improving" Actually Means

Before diving into the experiments, let's be clear about what we're dealing with. A frozen language model cannot rewrite its own weights—GPT-5.5 is GPT-5.5 tomorrow, last week, and right now. So when Hermes claims to self-improve, that promise has to mean something concrete rather than magical. The architecture breaks down into three layers: semantic memory (facts about you stored in plain ~/.hermes/memories/USER.md), procedural memory (reusable SKILL.md playbooks the agent writes for itself under ~/.hermes/skills/), and a Curator process that prunes, archives, and de-duplicates those skills so they don't rot into clutter. The model is frozen; the notes are alive. My first test was deliberately trivial—a "start my day" shell script printing the date, listing recent Downloads, and adding a random Stoic quote. I slipped in throwaway personal details: "I'm Zoll, I prefer concise output and zsh, and I like Marcus Aurelius." The agent reached for its memory tool before writing a single line of code. No instruction to remember anything—just a unilateral decision that who I am was worth recording. The USER.md file confirmed it: "User is Zoll; prefers concise output and zsh; likes Marcus Aurelius." But no skill appeared in ~/.hermes/skills/. My hypothesis at this point: maybe self-improvement is just a nice phrase for profile-keeping.

The Four Minutes That Changed Everything

For the second test, I gave Hermes something genuinely reusable—a "port-killer.sh" script that would find and kill whatever was listening on a given port, with macOS-specific handling. Crucially, I said: "This is a workflow I'll reuse all the time, so make it solid." What followed was impressive. The agent wrote the script, spun up a real Python HTTP server to test against, verified its own work, and even caught a zombie-process edge case around set -e that caused a genuine bug—it reasoned about the issue and patched itself. Then I did what I'd been building toward: "Save what you just figured out as a reusable skill so you can do this faster next time." The agent began creating macos-port-killer, stopped abruptly citing a "name collision," folded content into an existing file instead, and deleted the duplicate. Something was off. So I cracked open the logs. $ grep -E "skill_manage|conversation turn" ~/.hermes/logs/agent.log 23:45:40 conversation turn: msg='...write me a zsh script port-killer...' 23:49:01 tool skill_manage completed ← it wrote a skill HERE 23:53:41 conversation turn: msg='Save what you just figured out as a reusable skill...' The agent had written itself a skill at 23:49:01—over four minutes before I gave the instruction that supposedly triggered saving. By the time I asked, it had already done the work and recognized its earlier output when I tried to force a duplicate.

The Artifact That Reframes Everything

The skill Hermes wrote for itself wasn't "how to kill processes on port X." It was ~/.hermes/skills/software-development/shell-scripting/SKILL.md—generalized shell scripting competence, personalized. The file opens with user-specific defaults: "For scripts created for Zoll, prefer zsh when no shell is specified. Every created shell script should be immediately runnable: start with #!/usr/bin/env zsh, include set -euo pipefail, run chmod +x." It absorbed preferences I mentioned once in passing and hardened them into reusable procedure. That's the reframe this investigation handed me. Self-improvement in Hermes isn't the agent getting smarter—it's the agent getting better at serving one specific person. The intelligence is fixed; what compounds is personalized context about who you are, how you like things done, and procedures worked out on your behalf. Improvement and relationship are the same motion.

Honest Limits

The skill loop is gated, not guaranteed—the trivial daily-brief task didn't trigger it. MEMORY.md stayed empty throughout; only USER.md ever got written. And this behavior is model-dependent—GPT-5.5 as the backend cleared some threshold that weaker models might miss entirely.

Key Takeaways

"Self-improvement" in Hermes means accumulating personalized context, not rewriting weights
The agent writes skills autonomously when work crosses a complexity/reusability threshold
It generalizes specific tasks into broader competencies addressed to you by name
The Curator enforces discipline—Hermes refused to create duplicate skills on its own earlier work

The Bottom Line

We spend enormous energy arguing over which model is smartest. Hermes is a bet that, past a certain point, intelligence isn't the scarce resource—continuity is. An agent that remembers you, accumulates how you work, and writes its own playbooks tuned to your preferences doesn't need to be a genius. It needs to stick around and pay attention. That's what makes this worth watching.

> I Tested Hermes Agent's Self-Improvement Claim. It Wrote Its Own Skill Four Minutes Before I Asked.