The AI agent observability conversation has moved past the basics. Teams now accept that agents need traces, evals, and feedback loops. The hard part is what happens when an agent crosses out of its own runtime and into a tool server owned by another team, vendor, runtime, or cloud account. That boundary is where the trace disappears—and where production incidents become impossible to reconstruct from telemetry alone.

Why MCP Created a New Blind Spot

MCP made tool integration portable across model providers and runtimes. It did not make that tooling observable. When a planner calls a tool server over MCP, the MCP server sees an opaque tools/call request. The downstream database sees a single query. The payment API sees one request. The observability backend sees individual spans from each system with no shared context linking them back to what the model actually decided to do and why. Christine Yen flagged this tension at Honeycomb's O11yCon in San Francisco this week: agents are writing code, triaging incidents, and running production through orchestration layers, while engineering has minimal visibility into whether those actions added value or caused harm.

SEP-414 and W3C Trace Context Through _meta

The fix lives in a narrow specification. SEP-414 reserves the three W3C trace keys—traceparent, tracestate, and baggage—for propagation through MCP's params._meta field, which sits alongside tool arguments on every call. The spec makes an exception for these standard key names rather than inventing DNS-prefixed alternatives, so existing OpenTelemetry instrumentation works without modification. An agent runtime injects W3C trace context into _meta before the call; the MCP server extracts it and creates a linked server span under the same trace. Tool code—database queries, queue writes, API calls—then runs with inherited trace context downstream.

HTTP Spans Will Not Save You

A common mistake is assuming transport-level tracing covers this gap. It does not. Streamable MCP transports can bundle multiple requests into one connection, and a single logical operation may spread across retries or switch transports entirely. Stdio-based servers have no HTTP request to attach a trace to at all. If instrumentation stops at the transport layer, operators see plumbing metrics with zero information about which MCP method was called, what tool name was used, which session was involved, or what error type came back. Grafana's MCP server documentation already defines the relevant attributes—gen_ai.tool.name, mcp.method.name, mcp.session.id—with explicit support for W3C trace context propagation from params._meta.

The Implementation Checklist Is Small and Non-Negotiable

Inject traceparent into params._meta on every client-side tool call. Extract it in the MCP server template so individual tools do not need to handle it themselves. Name spans by MCP method plus a stable, low-cardinality identifier for the tool: tools/call get_weather is fine; dynamic resource URIs belong in attributes instead of span names. Attach gen_ai.tool.name, mcp.session.id, mcp.protocol.version, and error.type as structured attributes. Propagate trace context to every downstream API call or database query made by that tool. Keep sensitive data—prompts, access tokens, customer emails—out of baggage entirely because trace context is designed to cross service boundaries. The result goes to a backend capable of rendering agent decision spans alongside service execution spans in the same timeline.

Ownership Has to Live at the Boundary

The organizational failure mode here is familiar: the AI team waits for the MCP tools team to add tracing, the platform team assumes someone else is handling it, and no one claims ownership of the boundary between them. The practical fix is straightforward—own the boundary. For internal MCP servers, trace propagation belongs in the server template so every tool benefits automatically. For vendor-provided servers, test the contract by sending a traceparent in params._meta and verifying that your observability backend receives linked spans on the other side. Baggage policy should be documented before developers discover it as a convenient place to attach tenant hints or evaluation cohort metadata.

Key Takeaways

  • MCP made tools portable; observability does not automatically follow them across runtime boundaries
  • SEP-414 enables W3C trace context propagation through params._meta using standard key names: traceparent, tracestate, baggage
  • Transport-layer HTTP spans are insufficient for MCP tracing because streamable transports and stdio operate differently
  • Span naming should use low-cardinality MCP method plus tool name identifiers to avoid backend cardinality explosions
  • Baggage policy must exclude prompts, secrets, and PII since trace context crosses service boundaries by design

The Bottom Line

MCP solved the portability problem. SEP-414 and OpenTelemetry's MCP semantic conventions solve the observability problem—but only if teams actually implement propagation at the tool boundary instead of treating it as someone else's responsibility. A model transcript is not a trace. It will never show you what the agent chose to do, what tools executed that decision, which downstream services were affected, or where latency crept in. Pass the context, name the spans, keep cardinality low, and follow the trace across the same boundary your agent just crossed.