The folks behind trace-commons want to flip the script on how coding agent training data flows through the AI industry. Their new project, simply called "trace commons," lets developers donate anonymized sessions from Claude Code, Codex, and 50+ other agents to a publicly available CC-BY-4.0 dataset hosted on Hugging Face.

The Problem With Closed Agent Data

Every time you use an AI coding assistant, you're generating rich data about how these tools actually think through problems โ€” the dead ends they hit, the reasoning that gets them back on track, the mistakes and corrections along the way. Right now, that goldmine of behavioral data stays locked inside a handful of companies. Researchers, open-source developers, and smaller players can't study it or build on it. Trace commons aims to break that moat by creating infrastructure where the community can pool these traces together.

How Privacy Protection Works

The donation process is designed with privacy as a first-class concern. When you run /donate-trace after an open-source coding session, the tool strips out file paths, usernames, and secrets locally on your machine before showing you exactly what will be sent. You review it and confirm โ€” no blind uploads. The dataset also enforces that only public, openly-licensed repositories qualify for donation; private code, proprietary work, or client projects are explicitly blocked. If you're logged in, you can contribute with attribution, but anonymous donations are the default.

Getting Involved Takes One Command

Installing trace commons requires a single command: npx skills add trace-commons-ai/donate-trace. The installer auto-detects and integrates with whatever agent you're running โ€” Claude Code users get the /donate-trace slash command, while pi users access it via /skill:donate-trace. After installation, you run the donation command after any session working on open-source projects. Submissions go through a pull request workflow where a maintainer reviews before anything hits the public dataset.

Why This Matters for Open-Source AI

The asymmetry in who controls agent training data has real consequences. Companies with massive user bases accumulate behavioral signals that smaller competitors can't match. An openly-licensed commons changes that calculus โ€” anyone can download, study, and fine-tune models on this data without negotiating enterprise contracts or signing NDAs. It's the open-source philosophy applied to AI training infrastructure: build in public, share the gains.

Key Takeaways

  • Install with one command: npx skills add trace-commons-ai/donate-trace
  • Supports Claude Code, Codex, pi, and 50+ other agents
  • All anonymization happens locally before you see what will be sent
  • Only public repositories qualify โ€” no private or proprietary code
  • Dataset is CC-BY-4.0 licensed on Hugging Face for anyone to use

The Bottom Line

This is the kind of infrastructure that makes the AI ecosystem healthier for everyone except those who benefit from data hoarding. If you write code with AI assistants, your sessions have value โ€” might as well let the community benefit from yours while you're at it.