When You Let AI Port Your Code, It Leaves a Paper Trail: A Forensic Investigation of OpenAI Codex

William Cotton wanted to know exactly what OpenAI's Codex leaves behind when developers use it to port software from one language to another. What he found was a sprawling forensic goldmine of conversation logs, SQLite databases, JSONL session files, and even base64-encoded screenshots—all sitting in plain text on the local filesystem at ~/.codex/.

The Experiment: Porting a Website with Minimal Instructions

The test case was straightforward: Cotton asked Codex to recreate his website williamcotton.com entirely in Rust after running cargo init. He gave it only two lines of context about what he wanted, then watched as the AI agent inspected his original site (written in Web Pipe, a custom DSL), curled the live URL, and got to work generating Rust code. After an initial mistake where articles were hardcoded instead of pulled from Contentful, one correction prompt got everything working—including HTMX integration and proper content rendering via render_rich_text.

What's Actually Stored: A Deep Dive into ~/.codex/

The author then fired up another terminal window and started grepping through Codex's local storage. The results were immediate and alarming. His exact prompt appeared in multiple files within minutes, including logs_2.sqlite with TRACE-level entries tagged to codex_api::endpoint::responses_websocket, session JSONL files containing his entire conversation history, state_5.sqlite storing thread metadata (title, first_user_message, preview) across 375 historical sessions, and a history.jsonl file indexing every session by ID. The main session file rollout-2026-06-26T07-04-14-019f03d0-f7d1-7931-a089-3cb1c1f627cd.jsonl contained 477 lines totaling over 1.4MB, with event_msg and response_item types dominating the structure—plus data:image/png;base64 markers indicating screenshots were being captured during development.

SQLite Tables Reveal Extensive Historical Context

Using Python to search all text columns across state_5.sqlite's tables, Cotton discovered that his prompt appeared in threads.title, threads.first_user_message, and threads.preview at row 375—the same number as the total thread count. The schema shows fields for model_provider, cwd (current working directory), git_sha, git_branch, git_origin_url, cli_version, tokens_used, memory_mode, and even agent_nickname with agent_role. This isn't just conversation history; it's a complete development environment snapshot tied to every session.

Why the IP Implications Are Significant

The argumentative power of this evidence is substantial precisely because it's human-readable plain English. Any company concerned about proprietary code being ported to open source—or competitors using AI to reverse-engineer their work—now has a roadmap for discovery. The SQLite databases, JSONL files, and log entries all contain enough context to reconstruct not just what was built, but the original intent behind it. Base64 screenshots stored in session files could potentially be reconstructed to show exactly what developers were looking at while Codex worked.

Key Takeaways

OpenAI Codex stores full conversation history in ~/.codex/sessions/ as JSONL files with timestamps and turn IDs
SQLite databases (state_5.sqlite, logs_2.sqlite) contain thread metadata, prompts, and websocket communication logs
Base64-encoded screenshots are captured during development sessions and stored locally
Every session is indexed by ID in history.jsonl with 375 historical threads preserved per installation

The Bottom Line

If you're using Codex at work without understanding what it's storing locally, your entire development workflow—every prompt, every code decision, every repository context—is sitting in plaintext on your machine. This isn't hidden data; it's discoverable forensic evidence that could be subpoenaed, leaked, or used against companies in IP disputes tomorrow.

> When You Let AI Port Your Code, It Leaves a Paper Trail: A Forensic Investigation of OpenAI Codex