For three consecutive mornings in late June 2026, an engagement bot quietly died on me. No traceback, no stderr output, no heartbeat log. Fourteen calls to the Claude CLI returned rc=1 with empty error strings, and by 08:06 the bot had built a knowledge-base index of zero characters and exited cleanly without posting a single reply. The logs told me nothing — which turned out to be the actual clue.
The Setup
I run an engagement bot that wraps the Claude CLI via agent/claude_cli.py, a thin subprocess.run shim that shells out with claude -p "$PROMPT" --output-format json --model sonnet --tools "" --no-session-persistence. It parses the JSON response and returns the result field. The wrapper handles failure the way every Python subprocess tutorial teaches: catch FileNotFoundError and TimeoutExpired, check returncode on non-zero exit, log stderr. The problem is that this convention — stdout for answers, stderr for diagnostics — is exactly what the Claude CLI breaks when you use --output-format json. With that flag enabled, the entire transport is JSON on stdout. Success and failure both. On a 200 you get {"type": "result", "is_error": false, "result": "..."}. On a 429 (rate limit) you get {"type": "result", "is_error": true, "api_error_status": 429, "result": "Rate limit hit"}. The process exits rc=1, but the diagnostic detail lives on stdout next to where the answer would have been. stderr stays empty.
Wrong Hypotheses
I burned hours chasing standard failure modes: auth (token valid), PATH (identical between service and shell), ANTHROPIC_API_KEY (set in both contexts), timeout (failures were instantaneous, not slow), encoding (PYTHONIOENCODING=utf-8 already present). The debugging checklist returned all green because the bug lived in something the checklist takes for granted — that errors arrive on stderr. They did not. They arrived on stdout the whole time, in JSON, with the HTTP status code clearly labeled, waiting for someone to read them.
One-Line Fix, Fifteen-Test Surface
The actual change parses result.stdout as JSON when returncode != 0 and pulls out api_error_status and result: try json.loads(result.stdout), extract the fields, log them. Once I could see the actual error code, I added retry logic — the wrapper now retries once after a 30-second sleep on {429, 500, 502, 503} but fails immediately on {401, 403, 404}. The behavior change is small; the test surface around it is not. Fifteen new tests in tests/test_claude_cli.py, including test_rc1_logs_stdout_error_not_stderr — the one I most wish I'd written on day one.
Secondary Bug Found
While hunting the stdout issue, I also found why the bot was exiting cleanly instead of logging an error: an uncaught exception from _find_one_candidate() during Playwright's page.close() was propagating past except KeyboardInterrupt and ending the process silently. Two unrelated bugs stacked to produce one symptom — bot gone, no error, no trace. The retry fix would have handled the rate-limit pain on its own; only the try/except Exception made the bot survive the crash that the rate-limit pain was masking. This is the median case for production debugging, not an edge case.
Key Takeaways
- Test the error output contract of every CLI you wrap on day one — not the success path, where surprises don't live
- When standard debugging checklists return all green, the bug lives in something taken for granted
- Logging the wrong stream is worse than logging nothing — it tells future-you there's nothing to find when there is
The Bottom Line
The Claude CLI's stdout-for-structured-errors design is internally consistent with --output-format json and defensible on those terms. But it's exactly the contract I would have caught in five minutes of running claude -p "test" against a deliberately broken prompt before shipping the wrapper. I didn't, because I assumed it. The assumption cost three days. Logging rc=1 with empty stderr was actively misleading — a silent failure is better than a confident lie.