AI agent authentication isn't complicated—it's just that most teams are doing it wrong. When an AI agent calls a tool in Salesforce, GitHub, Jira, or Snowflake, the runtime needs to answer one question before anything else: which actor is allowed to make this call? The model can propose the action. The runtime has to attach authority to it. And right now, that authority usually comes from an API key shoved into an environment variable because it's the fastest path to a working demo.
The Secret Wins
The problem starts innocently enough. A developer writes a tool wrapper for the agent, needs credentials, and adds an API key to an env var with a TODO comment about removing it later. That TODO ships to production because now the agent handles support tickets, reconciles invoices, or pokes around in CI pipelines. The shared API key quietly becomes the identity of the agent—and here's where it gets dangerous: if that one credential can read every customer record, submit refunds, update tickets, and write to production data, prompt-based carefulness is theater. The tool description says those powers apply only when appropriate. The audit log shows one credential doing a pile of different tasks anyway. This pattern has a name outside the agent world: OWASP's Non-Human Identities Top 10 already covers production applications identifying themselves as non-human identities. Agents are just adding themselves to that growing list of stranger workloads, running differently than normal services but still requiring access to systems and data. The industry knows how to handle this—Kubernetes has had ServiceAccount tokens through the TokenRequest API for years. SPIFFE generalizes workload identity with short-lived X.509 and JWT SVIDs. AWS STS issues temporary credentials after OIDC-based identification, Google Cloud has Workload Identity Federation, Azure has managed identities. The playbook exists. It just keeps getting ignored when the interface is conversational.
Delegation Is the Missing Primitive
The real fix involves putting an identity assertion in the flow: this agent, this tenant, this user context if present, this policy version, this tool request, this approval state. That assertion gets exchanged for a credential only when the action needs one—and that credential dies after use. OAuth's RFC 8693 defines token exchange exactly for this shape: one temporary credential exchanged for another temporary credential intended for different context. In practice, the model proposes an action, the runtime checks policy, the broker issues a credential scoped to that action and tool context, the call happens, and the credential expires. Not after a quarter. Not after someone remembers to rotate it. It expires because the system puts expiration in the path. This changes the damage pattern fundamentally. A compromised tool wrapper no longer implies broad access to every downstream system. Prompt injection has to cross approval, run, tenant, and policy boundaries. A subagent that escapes its execution boundary can't reuse credentials after context has expired. The agent is still useful—it just queries through a production boundary that understands production concerns.
The Runtime Owns the Identity Boundary
A model provider shouldn't own this boundary. Neither should prompts or tool schemas. The runtime owns it because the runtime follows the whole path—connecting agent definitions to threads, tenants, identity information including the initiating user, whether work is backgrounded, whether a human approved a risky step, which tool is being called, and which downstream credential is being requested. It attaches those facts to an identity assertion and makes a policy decision before any assertion leaves the process. That policy looks boring and explicit: the refund tool requests a payment credential for the current tenant only; a GitHub tool requests write access after CI produces an eval pass; a Snowflake query gets read credentials scoped to one warehouse, one role, one time window. A subagent runs with delegated identity but fewer capabilities than the parent run. The list isn't impressive—which is exactly why it's powerful.
Audit Follows Identity
Agent observability without identity is half a story. A trace for refund_customer can show latency, tool arguments, model output, retries—all visualized in a span tree. Then someone asks who had authority to issue that refund, and the trace becomes archaeological excavation. The right trace shows the tool call connected to a principal with an agent ID, run ID, tenant, user context, policy decision, credential scope, and expiration time. This determines whether there's a real postmortem or just hand waving about the agent doing something weird. The same principle applies in testing—if a runtime can create delegated credentials, CI should verify that boundary holds. A refund agent fails against the wrong tenant. A code agent fails when eval gates are red. A research agent fails when it asks for write access to a system it only reads.
Getting Started
Shared keys hide tenancy, user context, agent identity, subagent inheritance chains, approval state, action matching, and rotation until rotation becomes an outage. The fix doesn't require a full rewrite—start with one boundary. Put an identity broker between the runtime and the first high-risk tool. Give the runtime a workload identity. Have the broker exchange that for a tool credential. Associate decisions with tenant, run, and operation. Record policy decisions in traces. Add a CI test proving wrong-tenant access fails. Expire credentials quickly. Make failures visible when the broker returns no. Then move the next tool behind the boundary.
The Bottom Line
The model decides on the next step. The runtime decides whether that step gets a credential. Production systems already solved this problem with Kubernetes, SPIFFE, OAuth token exchange, and cloud workload federation—they exist because static secrets rot and shared principal accounts make bad situations worse. It's a mistake to grant AI agents an exemption because their interface happens to be conversational. Name the agent as a workload, give it a real identity, and stop letting API keys become accidental actors.