Cli-Modelarium 0.1.4 dropped on June 24, 2026, adding two new providers to bring the total to ten cloud options for side-by-side LLM comparison. The headline additions are Alibaba's Qwen models (accessible through DashScope) and Z.AI's GLM lineup, giving developers a broader landscape of frontier and open-weight models to benchmark against each other in a single command.

Two New Providers, Ten Total

The updated roster now includes OpenAI, Anthropic, Google, xAI, DeepSeek, Mistral, Groq, OpenRouter, plus local model support alongside the fresh additions. For anyone who has wanted to run their own prompts through competing models without spinning up infrastructure or juggling multiple API keys, this is a one-liner: pip install --upgrade cli-modelarium followed by a straightforward query specifying which models to compare.

Statistical Rigor for Real Comparisons

Cli-Modelarium goes beyond eyeballing output differences. With --runs set greater than 1, it automatically runs statistical tests including bootstrap confidence intervals and paired significance tests using McNemar's method, so you can tell genuine performance gaps from noise. The tool also ships with hallucination detection, LLM-as-judge scoring, CI-ready assertions, and built-in cost tracking across all providers.

Cost Control and Model Grouping

The --max-cost flag acts as a hard cap on API spending during multi-model runs, preventing surprise bills when comparing expensive frontier models against budget alternatives. In this release, both Qwen and GLM have been added to the model groups (all-flagship, all-budget, all-fast, all-cheap), with GLM also included in the new all-reasoning group for targeted benchmarking.

Additional Updates

Pricing has been refreshed to match current provider rates across the board. Python 3.14 support is now official, and several model identifiers were updated to track renames from various providers. The tool remains Apache 2.0 licensed with no infrastructure requirementsβ€”install it via pip, point it at your models, and go.

Key Takeaways

  • Cli-Modelarium now supports 10 cloud LLM providers including the newly added Qwen (DashScope) and GLM (Z.AI)
  • Built-in statistical testing helps distinguish real performance differences from random variation
  • Cost control features prevent runaway API bills during multi-model benchmarks

The Bottom Line

Expanding to ten providers is a solid move for an open-source benchmarking tool that already covers the major players. If you have been making provider decisions based on vibes or benchmark articles that do not match your workload, Cli-Modelarium gives you the data to actually back up those choices.