Google Open-Sources AX, a Distributed Agent Runtime Built for Reliability at Scale

Google dropped AX (Agent eXecutor) on GitHub today—an open-source distributed runtime for coordinating AI agents across distributed infrastructure. The project, currently in early active development with the usual disclaimer about breaking changes ahead of a stable release, aims to solve one of the gnarliest problems in agentic systems: keeping autonomous workers reliable when things go wrong mid-execution.

What AX Actually Does

At its core, AX provides a controller that orchestrates agents, tools, and skills as isolated actors communicating over resumable streams. The architecture separates concerns cleanly—agents run in isolation (local or remote), tools can be MCP servers, and the controller maintains an event log for durable state management with automatic recovery. If your client disconnects mid-execution, you can resume from the last sequence number rather than starting over from scratch. The CLI makes this accessible: ax exec --input "Can you list me this directory?" gets you running immediately with built-in tools already linked in. Conversations persist by UUID—you can disconnect, come back later with --resume, and pick up where you left off. For branching experiments or rollback scenarios, there's ax fork to create a new event log from any checkpoint.

The Resumability Angle Is What Matters

This is the part that should get ops folks paying attention. AX implements a single-writer architecture through its controller—only one entity controls state transitions at any moment—which eliminates race conditions in distributed execution. The event log stores durable execution state, and when combined with compute-layer actor resumption on compatible platforms (Agent Substrate being the recommended path), you get fault tolerance without rebuilding it yourself.

Deployment Reality Check

While AX runs anywhere, Google is clear: Kubernetes via Agent Substrate is the recommended production deployment. The server exposes a gRPC interface on port 8494 by default and uses SQLite for event logging (eventlog/log.sqlite in your config). Authentication for the built-in Gemini agent supports both AI Studio API keys and Vertex AI with application default credentials.

Customization Without Lock-In

AX explicitly positions itself as harness-agnostic and model-agnostic—no specific coding agent, no vendor lock-in to particular LLMs. Remote agents can be implemented via AX's native AgentService gRPC interface, Google ADK Python agents, A2A protocol bridges, or even experimental Colab notebook execution. The roadmap includes Antigravity as a built-in harness and BYOH (Bring Your Own Harness) support.

Where This Fits in the Stack

Let's be clear about scope: AX is infrastructure, not an agent framework. It doesn't build agents—it runs them reliably. Think of it as the Kubernetes for your AI workforce rather than another abstraction layer over LLMs. The Apache 2.0 license and public development process signal Google wants this to become foundational tooling rather than a proprietary advantage.

Key Takeaways

Resumable execution with event log persistence handles failure recovery automatically
Single-writer controller architecture prevents distributed race conditions
Kubernetes-first design targets production deployments at scale
Open agent integration via gRPC, ADK, A2A protocol—framework agnostic
Early development means APIs will change before 1.0 release

The Bottom Line

Google building open infrastructure for reliable agent orchestration is exactly what the ecosystem needs right now. AX won't be for everyone—it's ops-heavy and Kubernetes-native—but if you're running autonomous agents in production, this is worth watching closely as it matures toward stability.

> Google Open-Sources AX, a Distributed Agent Runtime Built for Reliability at Scale