On June 16, 2026, Z.ai dropped GLM-5.2, a 753-billion parameter Mixture-of-Experts model trained for long-horizon coding agents and released under an MIT license. Three weeks earlier, Anthropic shipped Claude Opus 4.8—its most capable general-access release to date. The result is the comparison that matters: the strongest open-weights model ever built against the strongest closed frontier model you can buy today. GLM-5.2 wins on price by up to 5.7x and takes three benchmarks outright, including AIME 2026 (99.2 vs 95.7) and IMOAnswerBench (91.0 vs 83.5). Opus 4.8 still dominates the rest, with its largest margins on multi-hour software engineering tasks where it doubles GLM's score on SWE-Marathon (26.0 vs 13.0).
The Price Gap Is Not Close
GLM-5.2 runs $1.40 per million input tokens and $4.40 per million output tokens through Z.ai, Novita, or Friendli serverless endpoints—identical across all three providers. Claude Opus 4.8 charges $5 in and $25 out at standard rates, jumping to $10/$50 if you enable fast mode. That makes GLM roughly 3.6x cheaper on input and 5.7x cheaper on output. For agents that read large codebases and generate long diffs, output tokens dominate the bill—a workload costing $1,000 per day on Opus lands near $176 daily on GLM-5.2 at standard rates. The price difference is structural: it's baked into the MIT license and the open deployment options, not a temporary promo.
Benchmark Breakdown: Where Each Model Wins
Z.ai published head-to-head results across 19 reasoning, coding, and agentic benchmarks. GLM-5.2 takes three of them—two competition math evals (AIME 2026, IMOAnswerBench) plus Terminal-Bench 2.1 under its own best harness configuration (82.7 vs 78.9). Opus 4.8 wins the remaining sixteen, with the gap widening as tasks stretch longer and become more agentic. On NL2Repo—building a full repository from natural language specs—Opus leads by over 20 points (69.7 vs 48.9). On SWE-Marathon, an ultra-long-horizon engineering benchmark, Opus doubles GLM's score (26.0 vs 13.0). The pattern is clean: for bounded math and coding tasks, GLM holds the frontier. For hours-long open-ended projects with heavy tool use, Opus earns its premium.
Open Weights Versus API Gatekeeping
This is where philosophy matters as much as performance. GLM-5.2 lives on HuggingFace under MIT license—you can fine-tune it, quantize it to reduce VRAM requirements, run it air-gapped for regulated data, and pin a specific version forever without worrying about deprecation schedules. It runs on vLLM, SGLang, xLLM, KTransformers, and Transformers. No regional restrictions, no rate limit surprises mid-task. Claude Opus 4.8 is proprietary—you access it through Anthropic's API, Amazon Bedrock, Google Vertex AI, or Microsoft Foundry, subject to their rate limits, content policies, and eventual model deprecation. If your product requires a frozen model behind an air gap or you need to fine-tune on proprietary data, GLM-5.2 is the only choice in this comparison. One caveat: Opus 4.8 includes vision capabilities while GLM-5.2 is text-only today.
Architecture: IndexShare and Speculative Decoding
GLM-5.2's efficiency story lives in its architecture. Z.ai introduced IndexShare, which reuses a single lightweight sparse-attention indexer across every four layers, cutting per-token FLOPs by 2.9x at 1 million token context. The model also reworked its MTP (Multi-Token Prediction) layer for speculative decoding, raising acceptance length by up to 20%. These changes are what let an open 753B MoE model serve a 1M token window at the price point Z.ai offers—Anthropic simply doesn't match it on cost per token. Both models expose tunable thinking effort (High/Max for GLM; High/extra/max for Opus), letting you trade latency against accuracy per request.
When to Pick Each Model
Choose Claude Opus 4.8 if your agents run multi-hour software engineering tasks, need vision input from screenshots or PDFs, or require a managed frontier model with first-party cloud support and you're okay with the premium pricing. The use cases where Opus justifies its cost are specifically NL2Repo-style repo generation, SWE-Marathon-length projects, and Tool-Decathlon tool orchestration—tasks where that 20-point benchmark gap translates directly to completion rates. Choose GLM-5.2 if cost is a first-order constraint, you need MIT license freedom to self-host or fine-tune, your workload skews toward reasoning and competition math, or you're running high-volume bounded coding tasks at scale. For most teams running production agents on short-to-medium horizon work, GLM-5.2 delivers frontier-adjacent performance at roughly a fifth of the output price.
Key Takeaways
- GLM-5.2 costs 3.6x less input and 5.7x less output than Opus 4.8, with identical pricing across Z.ai, Novita, and Friendli
- GLM wins AIME 2026 (99.2), IMOAnswerBench (91.0), and Terminal-Bench under its best harness—Opus takes the other 16 benchmarks
- NL2Repo gap is 20.8 points and SWE-Marathon gap doubles Opus's score over GLM, making long-horizon work an Opus premium use case
- MIT license enables fine-tuning, air-gap deployment, and version pinning impossible with proprietary API-only access
- Both models offer 1M token context windows; both expose tunable thinking effort levels per request