web-researcher-mcp is a production-grade Model Context Protocol server written in Go that bridges the gap between AI assistants and the live internet. Developed by zoharbabin, this open-source tool gives LLMs web search, content extraction, and multi-step research capabilities through MCP's standardized interface. The project supports Claude Code, Claude Desktop, Cursor, and any other MCP-compatible client with a single static binary that handles everything from basic searches to academic paper lookups.
Research Tools Suite
The server ships with eight specialized tools covering nearly every research use case. web_search provides general search with optional domain-focused lenses for programming, news, legal, medical, finance, science, and government topics. scrape_page extracts content from any URL using a 4-tier fallback pipeline—markdown negotiation first, then stealth HTTP, HTML parsing, and finally headless browser rendering via go-rod. The combined search_and_scrape tool runs the full pipeline with quality scoring and deduplication built in. Beyond standard web queries, you get image_search with size/type/color filters, news_search with freshness controls, academic_search spanning arXiv, PubMed, IEEE, Nature, and Springer, plus patent_search supporting US/EP/WO/JP/CN/KR offices with CPC classification filtering.
Pluggable Search Architecture
The real flexibility comes from the multi-provider search backend system. Currently supported providers include Google PSE (free tier: 100 queries/day), Brave Search (recommended for high-volume work), Serper.dev (Google-identical results), SearXNG (self-hosted, privacy-first deployments), and SearchAPI.io (unified API with multiple engine backends). Multi-provider routing lets you set SEARCH_ROUTING=brave,google,serper for automatic fallback—if one provider fails or hits rate limits, the next kicks in automatically. Each provider gets its own circuit breaker to prevent cascading failures when an upstream API goes down.
Security-First Design
Security wasn't bolted on as an afterthought—it permeates every layer of this stack. SSRF protection blocks all private/reserved IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) plus cloud metadata endpoints at 169.254.169.254. The custom DialContext validates resolved IPs before connecting and re-validates redirect targets at each hop to prevent DNS rebinding attacks. Three-tier rate limiting protects both your infrastructure and upstream API quotas—per-client token buckets, per-provider limits, and a global backpressure valve.
Enterprise Features for Team Deployments
For production environments serving multiple users or applications, the server runs in HTTP/SSE mode with OAuth 2.1 authentication. JWKS-based token validation handles automatic key rotation while per-tenant session isolation keeps teams' data separate. The content pipeline includes HTML sanitization via bluemonday's whitelist policy, paragraph-level deduplication across results, smart truncation at natural breakpoints, and quality scoring to filter low-value content before it reaches your LLM. Optional Redis caching, Prometheus metrics with per-tool stats, and structured audit logging round out the operational toolkit.
Getting Started
Installation takes seconds via go install github.com/zoharbabin/web-researcher-mcp/cmd/web-researcher-mcp@latest for Go users, or grab a prebuilt binary from the releases page. Docker deployment works out of the box with docker run commands, and you can build from source if needed. Configuration requires at minimum one search provider's API keys—Google PSE (two values), Brave Search (one key), or any of the other backends. Add to your MCP client config and you're live in minutes.
Key Takeaways
- Eight specialized research tools cover web, images, news, academic papers, patents, and multi-step sequential research
- 4-tier content extraction pipeline handles everything from static pages to JavaScript-heavy SPAs with headless Chrome
- Multi-provider routing with automatic fallback ensures reliability when upstream APIs fail
- Search lenses let you focus queries on curated domain lists for programming, legal, medical, or other verticals
- Enterprise features include OAuth 2.1, multi-tenancy, rate limiting, and audit logging for team deployments
The Bottom Line
This is the kind of infrastructure that makes AI assistants genuinely useful rather than just clever toys with training cutoffs. If you're running MCP-based workflows in production, web-researcher-mcp fills a critical gap by giving your models access to current information—something that separates real-world utility from demo-ware. The pluggable architecture and security-first design suggest this was built for serious use cases, not weekend hacking projects.