Documentation is the unglamorous work every engineering team postpones until the release date has already passed. That's exactly where Debbie O'Brien found herself with The AI Platform by Zephyr Cloud—a desktop app where teams collaborate with AI specialists in channels, think Slack meets AI agents. She had 55 pages to write, 59 screenshots to capture, and a product still shipping features while being rebranded weeks before launch. Her solution: an open-source AI agent called Goose, built by Block and now part of the Linux Foundation.

The Problem With Documentation Sprints

Writing documentation isn't hard because the writing is difficult—it's hard because of everything surrounding it. You need to understand each feature by reading source code. You must capture screenshots, crop them, optimize them, then repeat when the UI changes again. You have to maintain consistent voice across dozens of pages while the product evolves underneath you. O'Brien had been using Goose for other codebase tasks and wondered if she could teach it to write documentation from source, automate screenshot recapture when the app changed, and improve docs based on what users actually see on screen.

Building Three Skills That Changed Everything

Rather than just prompting the agent repeatedly, O'Brien created three reusable skills—markdown files encoding instructions, conventions, and tooling that Goose follows whenever loaded. The first skill, write-docs at 513 lines, serves as a style guide in code form. It defines voice (casual and direct, "Click Settings" not "You may want to consider clicking Settings"), formatting rules (bold for UI elements, italics for visible text, backticks for user input), page structure templates with frontmatter, headings, callouts, and cross-links, plus a verification checklist the agent runs before each commit. The second skill, doc-screenshots at 478 lines backed by 1,722 lines of tooling code across bash, Python, Swift, and more scripts, automates screenshot capture entirely. The third skill, docs-preview at 155 lines, handles building and deploying documentation to shareable URLs—something AI agents notoriously struggle with since each build produces unique URL hashes.

Why Not Playwright?

The obvious question: why not use Playwright for screenshots? O'Brien uses it daily but explains it wouldn't work here. The AI Platform is a Tauri desktop app—the UI runs in a native webview, not a browser tab. Playwright automates browsers and cannot connect to a Tauri's IPC-bridged routing and state management. She needed OS-level automation that finds windows, clicks on-screen elements, and captures what users actually see. That led her to Peekaboo, a macOS tool using accessibility APIs and screen coordinates instead of browser DevTools protocols.

The Screenshot Pipeline in Action

The pipeline works like this: Peekaboo locates the app window and focuses it, clicking UI elements by their visible text when navigation is needed. It captures at 2x retina resolution without drop shadows. A Swift script using Apple's Vision framework runs OCR on the captured image, finding every piece of text and returning pixel-accurate bounding boxes. A Python script using Pillow draws highlight overlays, borders, and spotlight effects based on those OCR results. Finally, pngquant and optipng compress the final images—typically reducing file size by 50 to 60 percent with no visible quality loss. The key innovation is a YAML screenshot manifest that defines all 59 screenshots declaratively: what to capture, how to navigate there, crop dimensions, and validation text that should appear in each image. When the UI changes, O'Brien updates navigation steps once and re-runs the batch instead of retaking screenshots manually.

Day-by-Day Breakdown

Day 1 focused entirely on planning—the phased approach that proved most critical to success. Phases 0 through 4 restructured existing content, moved developer-focused docs to a separate section, then tackled Getting Started, Daily Use, Power Features, and Settings documentation. Day 2 produced twelve commits in about ninety minutes covering all scaffolding, content, and initial screenshots. Day 3 saw forty-three commits during the polish phase—and the biggest disruption: the app's sidebar was redesigned mid-sprint with text labels replaced by an icon rail. Every screenshot showing the old sidebar was wrong. The manifest paid for itself immediately—update navigation steps once, regenerate all 59 screenshots in minutes. Day 4 covered undocumented features discovered through screen-by-screen audit (the embedded browser and code editor panel had no documentation at all) plus a thorough review pass catching contradictory text and duplicate content.

What Actually Broke

The rebrand from Zephyr Agency to The AI Platform happened during the sprint, adding friction beyond just find-and-replace. Alt text on 59 screenshots needed updating, every page referencing the product name required revision, and sentences starting with the old name now awkwardly needed "The" prepended. OCR wasn't perfect either—Vision framework occasionally misread similar-looking characters, requiring human review of overlay coordinates. The screen takeover problem meant O'Brien couldn't use her machine during batch captures since Peekaboo needs window focus and mouse control to navigate through dialogs and pages. She treated it as a coffee break: kick off the run, walk away, come back to fresh screenshots.

Key Takeaways

  • Skills encode conventions once so the agent writes correctly from sentence one—no repetition needed each session
  • Screenshot manifests with --audit and --compare modes turn UI changes from manual into automated verification
  • Peekaboo + Vision OCR handles native desktop apps where Playwright cannot reach
  • Two tools, two targets: Peekaboo verifies the app, Playwright CLI verifies the docs about the app

The Bottom Line

This isn't about replacing documentation writers—it's about eliminating the mechanical drudgery that burns them out. O'Brien spent her time on editorial decisions (what order to introduce features, how to tell a coherent story) while Goose handled repetitive structure enforcement and screenshot automation. The real win is treating documentation infrastructure like code: version-controlled skills that improve over time, automated verification pipelines catching regressions before release, and build systems deploying updates in under two seconds.