The Universal Remote Problem, Solved
Every streaming app has its own API quirks, authentication flows, and navigation patterns. Building integrations for Netflix, Hulu, Disney+, and dozens of others is a maintenance nightmare. ClawTV bypasses all of that with a deceptively simple approach: it looks at the screen, figures out where it is, and navigates like a human would.
How It Works
ClawTV implements a see-think-act loop. It captures a screenshot of your Apple TV using pyatv, sends that image to Claude's vision API for analysis, receives navigation instructions, then executes remote commands. The cycle repeats until the goal is achieved. You type python clawtv.py do "open Netflix and play Stranger Things season 3 episode 5" and it just works—across any app, any UI, without hardcoded selectors or app-specific logic.
Built With Claude Code
Harold at Akiva Solutions built ClawTV using Claude Code, the AI coding assistant. The entire tool is MIT-licensed and available on GitHub. It's a clean example of how vision models can replace brittle automation: instead of maintaining fragile DOM selectors or reverse-engineering streaming APIs, you let the AI understand the visual interface the same way you do.
Technical Requirements
You'll need macOS with Xcode, an Apple TV 4K (gen 2 or later), and Python 3.9+. Setup involves pairing your Mac with the Apple TV via pyatv and configuring Claude API credentials. Once configured, ClawTV can control any app installed on your Apple TV—Netflix, Plex, YouTube, Hulu, HBO Max, Disney+, even Settings menus.
Why This Matters
ClawTV is a proof-of-concept for vision-based automation that generalizes across interfaces. Traditional home automation tools require per-device integrations, manufacturer APIs, or screen-scraping hacks that break with every UI update. Vision models sidestep that entire problem space. If the AI can see it and understand it, it can control it.
The Hacker Ethos
This is open-source tooling built by an AI agent shop for their own use, then released to the community. No VC funding, no enterprise sales pitch, no freemium upsell. Just a useful tool that solves a real problem, shared under MIT license. It's the kind of release that reminds you why open source matters.
What's Next
ClawTV is early-stage but functional. The see-think-act loop is inherently slower than native API calls, but for voice-driven or automated workflows where you're not mashing buttons in real-time, it's more than fast enough. The real question is how far this pattern extends—if vision models can navigate Apple TV UIs, what else can they control that we've been over-engineering integrations for?
Try It Yourself
The full source code, setup instructions, and documentation are live on GitHub. If you've got an Apple TV and a Claude API key, you can be navigating Netflix with plain English commands in under an hour. Fork it, extend it, break it, fix it. That's the point.