AI Agents Evolve to Play Pokémon Crystal Using Genetic Algorithms in Forkable Sandboxes

A developer going by zozo123 has built something genuinely clever: a population of LLM agents that learn to play Pokémon Crystal through genetic algorithms, running in parallel forkable sandboxes on islo.dev. The project, dubbed pokeloop, evolved eight AI agents across eight generations until one cracked Falkner's gym and earned its first badge—no human player required beyond setup.

Why Not Real Pokémon GO?

The obvious question gets answered upfront: playing the actual mobile game from a Linux sandbox is mathematically impossible in 2026. Play Integrity hardware attestation, introduced by Google in May 2025, requires a TEE-rooted certificate chain that Redroid, Waydroid, and cloud-Android containers simply don't have—they fail by construction. Pokémon GO has been arm64-only since mid-2025, breaking ARM translation in Android-12/13 containers. And Niantic isn't passive: the company banned roughly nine million accounts in 2024 alone under its 'no warning' policy, with follow-up ban waves in July and November 2025 sweeping even cautious spoofers. The company's 2021 settlement with Global++ (S.D. Cal.) included a $5 million payout and permanent injunction. So: literal Pokémon GO botting would be a botnet that gets banned in five minutes.

The Architecture

The workaround substitutes the substrate while keeping the shape of the problem. Eight islo.dev sandboxes boot in parallel from a frozen base snapshot—roughly 75 seconds to fork eight sibling VMs versus eight minutes serially. Each sandbox runs Pokémon Crystal via PyBoy, an open-source Game Boy emulator with Python bindings. The orchestrator drives everything: it POSTs system prompts to each worker, polls state endpoints for fitness data, runs the genetic tournament, and pushes new prompt variants back out. One of eight workers failed during snapshot fork—roughly a 12% failure rate in real infrastructure—which the GA tolerates by operating on surviving candidates only. The three islo primitives doing heavy lifting: "snapshot save" freezes an identical eval environment so every candidate runs against the same game state; "islo use --snapshot" forks N sandboxes in parallel from that snapshot, forming the population; and "islo logs --type agent" harvests fitness traces for scoring. The entire algorithm lives in roughly 200 lines of orchestrator code.

How Evolution Works

The genome isn't weights—it's a system prompt. Claude Sonnet 4.6 receives only one tool: press_button(button, reason). Vision comes from a 160×144 PNG framebuffer plus a state digest derived from RAM reads. The fitness function is deliberately cheap and dense: r = 3·Δbadges + 0.5·Δpokedex + 1.0·new_map + 0.5·Δparty + 0.001·Δmoney − 0.001·step. No learned reward model needed. Each generation spawns eight sandboxed agents from the base snapshot, rolls them out for 200 steps in parallel, scores them, and selects the top two as elites. Those elites produce six children through LLM crossover (textual recombination of system prompts) plus 50% chance of LLM mutation—a natural-language rewrite rather than gradient descent. The best individual's terminal save-state becomes the next generation's base snapshot.

Results: Eight Generations to a Badge

The gain curve climbs monotonically across generations. Mean population fitness goes from 0.0 to +12.0; best individual rises from +1.5 to +17.0; even the worst individual improves from −1.5 to +6. The distribution shifts upward as selection pressure compounds. Milestone unlock order: Generation 1, an agent stops mashing START and walks away from a screen edge. Generation 2, "If a dialogue arrow appears, press A" propagates via crossover and multiple individuals advance NPC text. Generation 3, children of the dialogue-aware elites receive their starter Pokémon. Generation 4, a child mutates its prompt to add 'after a new map appears, continue in the same direction'—first route crossing. Generation 5: first wild Pidgey captured. Generation 6: Cherrygrove town navigation. Generation 7: gym entered. Generation 8: Falkner defeated. The agent has earned something.

Key Takeaways

Play Integrity attestation makes real Pokémon GO botting impossible from any sandbox in 2026—no workaround exists for the TEE chain requirement
Genetic algorithms can evolve effective system prompts without gradient-based fine-tuning, given the right environment and reward signal
Forkable VM snapshots solve the population diversity problem: eight parallel sandboxes boot identically, guaranteeing fair evaluation
RAM-derived rewards (badges, Pokédex, map flags) provide dense, interpretable fitness signals without learned models

The Bottom Line

This isn't about Pokémon—it's about whether you can evolve autonomous agents that actually accomplish goals in persistent environments. Eight generations from walking to badge-gated progression is a proof of concept with teeth. The snapshot-tree-as-search-tree framing suggests this architecture scales beyond toy problems. Watch for someone scaling this to something Niantic can't sue over.

> AI Agents Evolve to Play Pokémon Crystal Using Genetic Algorithms in Forkable Sandboxes