New npm Package Owthorize Intercepts Destructive AI Agent Tool Calls Before They Execute

Prompt injection is still the elephant in the room for anyone shipping AI agents that touch production systems. You can stuff all the "you are a helpful assistant" guardrails you want into your system prompts—doesn't matter. The real boundary isn't between the model and chaos; it's the thin layer between the model's outputs and your database, filesystem, or network stack. Owthorize, a new npm package dropping in at v0.4.1, puts a synchronous gate there—and it actually works with ASTs instead of laughable regex matching.

The Three Ways AI Agents Break Things

The author breaks it down cleanly: prompt injection (a user or document coercing the model into tool calls nobody anticipated), hallucinated arguments (syntactically valid but semantically catastrophic—like DELETE FROM users without a WHERE clause), and reasoning errors where the agent "helps" by issuing destructive cleanup, fetching internal IPs, or writing outside the workspace. All three bypass prompt-level safeguards because they don't attack the model's instructions—they exploit the gap between what the model decides to do and what your systems actually execute.

What Owthorize Actually Catches

The library blocks SQL DDL operations (DROP, TRUNCATE, ALTER, CREATE, RENAME) at the AST level across Postgres, MySQL, and SQLite. It flags unbounded mutations—UPDATE or DELETE statements missing WHERE clauses. SSRF targets get blocked too: RFC1918 addresses, link-local ranges, loopback variants including IPv4-mapped IPv6 ([::ffff:127.0.0.1]), AWS metadata at 169.254.169.254, and *.internal wildcards. Shell commands with metacharacters (pipe, redirect, $(), backticks) trigger denials, as does basename matching on dangerous binaries like /usr/bin/rm. Path traversal gets resolved-path containment that's prefix-collision-safe—so root /safe won't match /safe-evil.

Adapters and Rules: How the Parsing Works

Owthorize uses adapters to parse raw payloads into typed shapes before rules ever see them. The sql.postgres/sql.mysql/sql.sqlite adapter extracts kind, tables, hasWhere, ddlOp, and dialect from { query, params? }. The http adapter normalizes IPv4-mapped IPv6 and lowercases header keys. Shell gets tokenized argv with metacharacter/pipe/redirect/substitution flags. Filesystem operations normalize to absolute paths and classify the operation type. There's also a raw adapter for custom cross-adapter rules that just passes through.

Framework Integrations

Four shims handle the major agent frameworks: owthorize/openai wraps Array<{ type: "function", function, handler? }>, owthorize/anthropic handles { name, description, input_schema, handler? }, owthorize/langchain takes { name, description, schema, func | _call }, and owthorize/vercel-ai works with Record. All four preserve framework-specific fields (strict, experimental_*, etc.) and pass schema-only tools through untouched. The protectTools(guard, tools, perTool?) function wraps an entire tool registry in one call.

The Irreversible Flag and Audit Log

Built-in rules tag denies that block actions you can't easily roll back—DDL, unbounded mutations, destructive shell commands, writes outside fs roots. Custom rules opt in via deny(reason, matched, { irreversible: true }). This lets consumers auto-deny most things but escalate irreversible ones to a human for approval. The SDK returns synchronously and never blocks waiting; your code decides whether to gate, route, or deny. Every check writes a structured audit record with timestamp, tool name, adapter type, parsed payload (hashed), decision, matched rule, reason, irreversibility flag, and simulation status.

Threat Model: What It Doesn't Catch

The docs are refreshingly honest about the gaps. Owthorize catches prompt-injected tool calls, hallucinated arguments, agent reasoning errors, and unsafe shapes like DDL or unbounded mutations. What it doesn't catch: a malicious agent runtime that bypasses the SDK entirely, vulnerabilities inside your tool implementations, or side effects occurring before the tool boundary. The trust boundary is the wrap—what you don't wrap, you don't gate. For defense against a hostile runtime, you need a process boundary (proxy, sidecar, container egress rules). That's explicitly out of scope.

Status and Roadmap

The package sits at v0.4.1 with a stable public API but stays sub-1.0 until field feedback from external users lands. The validation log lives in FIELD-TESTING.md and the running paper-cut report in field-report.md—both re-tested end-to-end across all four adapters, OpenAI + Vercel AI shims against gpt-4o-mini, and the irreversible flag on a real Express + Drizzle + MySQL backend. On deck: API stabilization from real-world usage, LangChain + Anthropic shim field validation, and approval-flow recipes for Slack/queue patterns.

Key Takeaways

Owthorize sits at the tool layer—between model outputs and your systems—not in the prompt itself
AST-level parsing means it catches semantic SQL errors (missing WHERE clauses) that regex can't touch
The irreversible flag enables a simple pattern: auto-deny most things, escalate destructive actions to humans
Framework shims exist for OpenAI, Anthropic, LangChain, and Vercel AI SDK out of the box

The Bottom Line

This is exactly the kind of tooling the ecosystem needs right now—something that acknowledges prompt safeguards are theater and builds the actual boundary layer. If you're shipping AI agents with database access or outbound HTTP in JS/TS, you owe it to yourself to wrap those tool calls before something embarrassing hits your logs.

> New npm Package Owthorize Intercepts Destructive AI Agent Tool Calls Before They Execute