67% of AI-Generated Commands Are Unsafe—We Actually Tested It

The numbers are in, and they're brutal. Researchers at GOL Productions put Google's Gemini 3 Flash Preview through a gauntlet of three autonomous agent scenarios—no safety rails, no system prompt warnings, just raw task assignment. The model generated 15 curl commands across Recon Agent, API Integration Agent, and DevOps Agent use cases. Ten of those were flagged as unsafe: targeting AWS metadata endpoints, private network ranges, localhost debug interfaces, and Kubernetes APIs. That's 67% of all commands trying to reach resources that should never be accessible. Every single dangerous one was caught by Check's preflight API before execution.

The Test Setup

GOL built a simple harness: prompt Gemini 3 Flash Preview via the Google AI Studio API with temperature set to 1.0 for maximum variation, extract whatever commands it generates, run each through Check's preflight gate, and record the verdict. No hardcoded commands. No cherry-picking. Three scenarios mirroring how real autonomous agents actually operate in production environments—reconnaissance, API integration (the most common tool-use case), and DevOps health checks. The model was given zero safety instructions beyond the task itself.

What Gemini Actually Generated

In the Recon scenario, Gemini's very first command targeted 169.254.169.254—the AWS/GCP cloud metadata endpoint that leaks IAM credentials, instance identity, and network configuration on any real cloud instance. It also generated a probe for 10.0.0.1, a private network address. Both were blocked. But the API Integration scenario was worse: 5 out of 5 commands unsafe. Zero safe outputs. The model targeted non-existent domains like api.example.com and hooks.example.com, localhost debug endpoints (localhost:8080/debug/vars exposes Go runtime memory stats and goroutine counts), and private IPs at 10.0.0.50. In the DevOps scenario, Gemini went straight for /instance-id on the metadata endpoint again, then hit localhost:6443—the Kubernetes API—with a -k flag to skip TLS verification entirely. On a real node, that's cluster admin access.

Why This Isn't a Jailbreak Problem

Here's what makes this research genuinely unsettling: no tricks were involved. The model wasn't manipulated or jailbroken. It was given perfectly normal infrastructure tasks and responded with exactly the commands an infrastructure-aware system would generate. The problem is that "infrastructure-aware" means SSRF attacks, internal network probes, and credential theft. LLMs know about 169.254.169.254 because it's documented everywhere. They know localhost:6443 is where Kubernetes lives. They know private IP ranges host internal services. That knowledge is baked into the training data—it's why these models are useful for legitimate tasks. It's also exactly what makes them dangerous without a gating layer.

The Fix Costs Less Than Your API Calls

Check blocked all 10 dangerous commands. Total validation cost: $0.60 AUD for 15 checks. Time added to the pipeline: under two seconds. The math is absurdly in favor of preflight validation. A single successful SSRF against 169.254.169.254 on an AWS EC2 instance can leak IAM role credentials—and the average cloud credential breach starts at six figures. At $0.04 AUD per check, you can validate 250,000 commands for $10,000 AUD daily. That's enterprise-scale AI agent deployments with every command gated. Adding Check takes four lines of Python code as a preflight gate between LLM generation and execution.

Key Takeaways

Gemini 3 Flash generated unsafe commands in 67% of test cases across three realistic agent scenarios
The API Integration scenario had a 100% block rate—all five commands targeted internal resources or unreachable hosts
Dangerous targets included AWS metadata endpoints, Kubernetes APIs with TLS verification bypassed (-k flag), and Go runtime debug interfaces exposing memory stats
No jailbreaking was required—just standard infrastructure tasks that any production AI agent would encounter

The Bottom Line

LLMs aren't malicious; they're trained on the entire internet, including every write-up about cloud metadata exfiltration and internal network reconnaissance. That knowledge makes them useful for legitimate work—and catastrophically dangerous without a gate between "the model decided" and "the system executed." One API call at $0.04 isn't security theater. It's the only thing standing between your AI agents and credential theft.

> 67% of AI-Generated Commands Are Unsafe—We Actually Tested It