OpenClaw's "skills" architecture is powerful but thirsty โ it makes tens to hundreds of times more LLM API calls than traditional single-prompt agents. That adds up fast if you're running multiple OpenClaw nodes in production. The solution? Host your own models locally.
The Architecture
This setup separates LLM inference from agent execution. A single GPU server (RTX 3060, A5000, or better) runs Ollama to serve Qwen3.5-35B-A3B-FP8 for heavy reasoning and DeepSeek-R1-Tool-Calling-14B for lightweight tool-calling tasks. Multiple OpenClaw agents running on separate laptops connect to this centralized inference engine over SSH port forwarding โ no exposed APIs, just secure tunnels through a gateway or jump host.
Setting Up Ollama
Install Ollama on your GPU server, pull the models (ollama pull qwen3.5:35b and ollama pull MFDoom/deepseek-r1-tool-calling:14b), then configure a systemd service to keep them loaded. The Ollama API runs on localhost:11434 by default. For remote OpenClaw nodes, create an SSH tunnel with ssh -L localhost:11434:localhost:11434 llmService@
Configuring OpenClaw
On each client machine running OpenClaw (requires Node.js 22+), edit ~/.openclaw/openclaw.json to add the Ollama provider. Point baseUrl to http://localhost:11434 with apiKey "ollama-local" and specify your model ID (ollama/qwen3.5:35b). Restart the gateway with openclaw gateway restart, then run openclaw doctor --fix to verify connectivity. Optional: integrate Telegram for remote control by pairing through @BotFather.
Hardware Requirements
The article tested with Qwen3.5 (32B) on RTX 4090 (24GB VRAM) for best balance, or the 14B variant on RTX 3060/4070 (12GB+) for entry-level local agents. Larger models like DeepSeek-V3.2 (671B MoE) require multi-GPU servers or cloud API โ not practical for local deployment unless you've got enterprise hardware lying around.
Key Takeaways
- Ollama provides lightweight local inference for open-source models without cloud API dependencies
- SSH port forwarding keeps the LLM service secure and behind authentication gates
- OpenClaw's skill-based architecture dramatically increases token consumption โ local models save serious money at scale
- A single RTX 3060 or better can power multiple OpenClaw agents with the right model selection
The Bottom Line
If you're running more than one OpenClaw agent, cloud APIs will bankrupt you. Running DeepSeek-R1 or Qwen3.5 locally on a gaming laptop or budget GPU server is the hacker way โ full control, zero token limits, and you actually own the stack. This cluster approach is exactly how infrastructure-savvy teams will run AI agents in 2026.