MCP's Real Attack Surface Isn't Prompt Injection — It's the Trust Boundary

We've been obsessing over prompt injection in the MCP ecosystem, and it's a distraction. The real damage doesn't happen when an LLM gets tricked into outputting a malicious instruction — it happens one step later, at the moment that instruction becomes a tool call running with your machine's full privileges. That's the trust boundary nobody is talking about, and that's what developer LuciferForge decided to map out in detail.

The Attack Chain Nobody Is Modeling

The dangerous chain in any MCP deployment follows a simple sequence: untrusted input → model gets convinced → tool call fires → server-side code executes on your box. Prompt injection is just step one — the delivery mechanism. What matters is what happens inside the handler once that call arrives. If the server-side code does anything unsafe with its arguments, you've essentially turned your LLM into a remote-code-execution courier, and it did exactly what it was designed to do. That reframe is the whole ballgame.

Twenty-One Patterns Across Five Languages

LuciferForge built an MCP security audit scanner that detects 21 vulnerability patterns across Python, Java, Go, C++, and Rust — and nearly every one predates LLMs entirely. The critical findings include eval(), exec(), and os.system() called directly on tool arguments; subprocess(shell=True) and Runtime.exec() with concatenated input; deserialization RCE via pickle.load(), torch.load(), or ObjectInputStream; yaml.load() without SafeLoader; f-string or string-concatenation SQL injection; URL concatenation leading to SSRF; and hardcoded API keys or tokens embedded in server source code. These are OWASP greatest hits, scored by how badly they bite when they're one tool call away from untrusted input.

Purpose-Aware Severity: The Design Decision That Actually Matters

A naive scanner just greps for eval() and floods you with false positives until you stop reading the output entirely. An eval() inside a sandboxed test harness is categorically different from an eval() on a tool's input argument — but most scanners score them identically. LuciferForge's approach weights patterns by context: where they sit in the codebase and what actually reaches them. A subprocess(shell=True) wired to a CI helper that never touches model input gets flagged as low priority. The same call connected to a tool argument? That's CRITICAL. Getting this distinction right is what separates a report someone acts on from one they dismiss.

The Scanner Ships As Its Own MCP Server

The audit tool runs locally as an MCP server with three tools: audit_repo points it at a GitHub URL and returns a scored vulnerability report; audit_code lets you paste snippets inline for immediate findings; and list_patterns shows every pattern tracked alongside its severity rating. Because it's integrated directly into Claude Desktop, your agent can audit a repository before you wire it into your setup — which is exactly when you need to know if the server you're about to trust shells out on its arguments.

Two Lessons That Outlast MCP Itself

First: input filtering is a sieve; a safe handler is a wall. The injection isn't the vulnerability — the handler is. Spend your defensive budget on what tool code actually does with incoming arguments, not just on sanitizing what reaches the model. Second: severity without context is noise that trains people to ignore you. A scanner that can't distinguish between a sandboxed eval and a reachable one will get muted within days of deployment. Context-awareness isn't optional — it's the feature that determines whether anyone reads your second report.

Honest Limitations and Open Questions

Static pattern matching catches vulnerabilities by shape, not proof-of-reach. Full taint analysis tracing an argument's actual flow into a dangerous sink would cut false positives dramatically — that's the next design frontier and doesn't exist yet in this tool. Path traversal in file-serving tools and SSRF in fetch-style tools remain genuinely ambiguous cases where determining malicious intent is hard. LuciferForge is asking two questions of the community: should the trust boundary live as sandbox-everything-by-default, or install-time scanning plus runtime allow-lists? And what pattern is missing from the current list of 21?

The Bottom Line

MCP security isn't a new problem — it's appsec with a dramatically expanded attack surface. A language model reading a webpage can now reach code that previously only a trusted caller could touch, and most MCP threat models haven't caught up to that shift. LuciferForge's scanner is MIT-licensed and free on GitHub at LuciferForge/mcp-security-audit, installable via pip. If you're shipping an MCP server, run it before you ship — or pay $29 for a human-reviewed report with a three-day turnaround.

> MCP's Real Attack Surface Isn't Prompt Injection — It's the Trust Boundary