Josh Adler's girlfriend didn't talk to him for two days after discovering a mysterious red light blinking in their bedroom at 3 AM. His explanation—that it was feeding video into an AI system he'd built—wasn't exactly reassuring. But Adler, a researcher and builder who previously created the persistent memory tool TrueMemory, is convinced this is exactly the kind of infrastructure the industry needs to solve what he calls AI's fundamental blindness.

The Problem Nobody Wants to Touch

"Every AI product right now knows you through text or voice," Adler writes in his DEV.to post. "What you type into a prompt. What you paste into a context window. Maybe your calendar, your emails, your screen. But your actual life? The one that happens in physical space? Your AI knows nothing about it." He spent months building products that grabbed screens and analyzed patterns—investors loved it, Reddit loved it—but the core problem remained: these systems could see his work but never him.

Building the Hardware Stack

Adler's solution involved wiring five rooms with cameras running on Raspberry Pi Zero 2W boards ($15 each), paired with ArduCam IMX708 12MP wide-angle modules and WM8960 audio HATs for ambient sound capture. A custom Python daemon handles motion detection and triggered recording, storing everything to a Ugreen NAS when idle. Total hardware cost came in under $500—a fraction of what he spent on camera modules that didn't work. "I spent weeks debugging device tree overlays," he admits. "Swapped camera modules three times before finding ones that actually performed. Burned through two Pi Zeros that couldn't handle the thermal load." This wasn't a weekend project vibed together; it was real infrastructure requiring real troubleshooting.

The Three-Layer Vision for AI That Actually Knows You

Adler frames his work within a three-layer architecture he believes the industry needs: an observation layer (cameras, mics, sensors capturing the physical world), a memory layer (persistent cross-session context that follows you beyond single conversations), and a reasoning layer (the model itself). His argument cuts to the core of why current AI feels hollow. "Nobody tells their AI 'I've been pacing around my office for 20 minutes.' Nobody types 'I skipped lunch again today.' But a camera sees all of that." The people who actually know you, he argues, recognize your tells—how you fidget when nervous, pace when stuck. That data isn't in any context window.

Why Everyone Is Investing in the Wrong Layer

"The whole industry is trying to make AI feel more human by tweaking the output," Adler observes. Adjusting personality settings, temperature parameters, avoiding words like 'awesome'—none of it addresses the real bottleneck: input. "They're training on polished, sanitized datasets and then wondering why it still feels like AI." His assessment is blunt: billions are pouring into layer 3 (reasoning), while layers 1 and 2 remain virtually untouched. "The models are smart enough. That's not the bottleneck anymore. The bottleneck is that your AI has never seen you. It's never been in the room. It's a hyper-intelligent entity trapped behind a text box."

Key Takeaways

  • Current AI systems are 'fundamentally blind' to physical-world context like body language, habits, and environmental cues
  • Adler's complete hardware setup cost under $500 using Raspberry Pi Zero 2W boards, ArduCam modules, and custom Python software
  • He spent weeks debugging device tree overlays, burned through two Pis due to thermal issues before achieving stable operation
  • His proposed three-layer architecture (observation/memory/reasoning) puts almost all current investment in the wrong layer
  • The real value isn't better prompts—it's data that reveals how someone actually lives and behaves, not what they consciously choose to share

The Bottom Line

Adler's project is either a glimpse at computing's next paradigm or a cautionary tale about moving too fast—but probably both. Whether you find this vision compelling or unsettling depends on whether you think AI should remain a tool you interact with or evolve into something that watches you live. Either way, someone's building it. "I'm not asking for permission," he writes. "I'm just showing you what's coming."