A developer and longtime observer of AI systems published findings this week showing Google's AI search returns completely different nonsensical answers every time it's asked the same simple question: "How many days of the week have a fish in them?" The correct answer—that zero days contain the word 'fish'—was only given once across dozens of queries. Every other response was wrong, and each failure differed from the others.
The Test Case That Broke Google's AI
The test itself is disarmingly simple. No tricks, no adversarial prompting—just a straightforward English question about word composition. Google, however, apparently struggled with basic literacy. One response claimed 'Thursday' contains a literal fish called 'Thurs.' Another bizarrely interpreted the question as an invitation for wordplay about 'thirst-day,' suggesting Thursday might make you want to drink like a fish. A third suggested various days have fish-related puns despite no such puns existing in standard English usage. The inconsistency is what makes this particularly damning. If Google's AI were simply wrong, that would be one thing—we've seen plenty of LLM hallucination at this point. But returning a different wrong answer each time suggests the system isn't reasoning about meaning at all. It's pattern matching against some training data corpus and guessing which output will satisfy whatever internal scoring function determines 'good enough.'
What This Reveals About AI Architecture
This fish problem exposes something fundamental about how these systems actually work under the hood. Modern language models don't process queries by understanding them in any meaningful sense. They calculate statistical relationships between tokens—chunks of text—and generate outputs based on what similar inputs produced during training. When you ask an unusual question that doesn't match common patterns, you're essentially triggering a roll of the dice. The glue-on-pizza incident from 2024 follows the same pattern. Google's AI saw 'pizza' and 'cheese falling off' and reached for related content about adhesives—not because it understood pizza physics, but because the statistical relationship between those concepts in training data suggested was relevant. We just didn't notice until someone asked a question that exposed the underlying mechanism.
The Implication for Developers
For anyone building products on top of these systems, this should be required reading. User-facing AI features need guardrails and validation layers that assume the model will occasionally produce confident nonsense. The days-of-the-week question is trivial, but imagine similar confidence failures in code generation, medical advice, or financial analysis. Google's billion-dollar algorithms can't consistently parse a simple English sentence about fish—that's not a bug to patch; it's a fundamental limitation of current approaches.
Key Takeaways
- Google AI returned different wrong answers each time asked 'How many days have fish?' despite only one correct answer existing
- The system demonstrates statistical pattern matching, not semantic comprehension—different queries produce random confident outputs
- This follows the same failure mode as the 2024 glue-on-pizza incident: misalignment between user intent and training data patterns
- Developers should assume AI systems will fail unpredictably on edge cases and build appropriate safeguards
The Bottom Line
Google's fish fiasco isn't just funny—it's a controlled experiment proving that 'artificial intelligence' remains a misnomer. These systems are extraordinarily sophisticated autocomplete machines, not thinkers. Until the architecture fundamentally changes, we'll keep seeing confident wrong answers to obvious questions, and that's something every developer needs to internalize before betting critical functionality on these tools.