Can an AI Agent Pass the Test We Give 4-Year-Olds?

Shridhar Shah, a Senior Software Engineer on Cisco's AI team, just dropped something fascinating: two Python agents running the Sally-Anne false-belief test, one of developmental psychology's most famous benchmarks. The results? One agent answers "box" (wrong), the other says "basket" (correct). That single-character difference in behavior reveals whether an AI has what's called "theory of mind"—the ability to understand that other people can hold beliefs that don't match reality. It's a concept kids master around age 4, and it's becoming a critical milestone for multi-agent systems in 2026.

The Sally-Anne Test, Demystified

The setup is elegant in its simplicity: Sally places her marble in a basket, then leaves the room. While she's gone, Anne moves the marble to a box. When Sally returns, where does she look? A 3-year-old will say "box"—they can't separate what they know happened from what Sally witnessed. A 4-year-old correctly answers "basket" because they understand Sally never saw the move. This test, first published by Baron-Cohen, Leslie & Frith in 1985, has become a real benchmark for AI systems. Shah implemented it in roughly 110 lines of Python as Part 3 of his "Toward Living Software" series.

The Code Trick: Tracking Beliefs Separately From Reality

The clever bit isn't some massive neural architecture—it's a single rule that updates beliefs only when the relevant person is watching. In Shah's implementation, beliefs[person] only changes inside someone_moves_the_marble() for people passed in as who_is_watching. When Anne moves the marble while Sally is out of the room, only Anne's mental model updates. The naive agent just reports ground truth ("the marble is in the box"). The theory-of-mind agent answers from Sally's perspective ("she thinks it's still in the basket"). That's it—that's the whole trick that separates a smart tool from something that can actually collaborate with humans or other agents.

Why This Matters Beyond Being a Cute Puzzle

Shah argues this isn't academic hand-waving. Multi-agent workflows, human-AI collaboration, and real delegation all depend on knowing what your collaborators already understand. Consider: handing off work requires knowing what the recipient knows; explaining things means telling them only what's missing; warning someone about a wrong belief only works if you can track that belief exists. An agent that assumes "everyone knows what I know" will skip critical context and break downstream processes. Most AI today reasons about the world state—2026's frontier is reasoning about people in the world, including when they're mistaken.

The Brutal Honest Note

Shah admits real-world implementations have to infer beliefs from behavior rather than having someone tell them "who was watching." That problem is significantly harder. His GitHub repo (living-software/03-theory-of-mind) includes a demo.py you can run right now, keeping the core concept as transparent as possible. He cites recent work including Kosinski's PNAS 2024 paper on LLMs in theory of mind tasks and February 2026 follow-up research showing how brittle current approaches still are.

Key Takeaways

Theory of mind: understanding others have beliefs that differ from reality
The Sally-Anne test is a classic psychology benchmark now applied to AI agents
Tracking beliefs separately from world state enables actual collaboration
Real inference from behavior is much harder than the demo's "who was watching" approach

The Bottom Line

This isn't just clever Python—it's a glimpse at what separates systems that automate tasks from systems that genuinely collaborate. Being smart about the world makes a good tool. Tracking separate mental models? That's how you build something that works alongside humans without constantly dropping context they needed.

> Can an AI Agent Pass the Test We Give 4-Year-Olds?