Liquid AI dropped LFM2.5-230M today—their tiniest model yet—and it's built for one purpose: running lean inference everywhere, no cloud required. The 230-million-parameter foundation model targets developers building agentic workflows on phones, Raspberry Pi boards, and robotics platforms where every milliwatt counts.

Raw Numbers That Matter

On a Samsung Galaxy S25 Ultra with Snapdragon Gen4, LFM2.5-230M hits 213 tokens per second decode speed. Dial it back to a Raspberry Pi 5 running a low-cost CPU, and you're still looking at 42 tok/s—numbers that make real-time on-device agents actually viable. The model was pre-trained on 19 trillion tokens with a 32K context extension phase baked in.

Training Recipe Breakdown

The post-training stack follows three stages: supervised fine-tuning with distillation from the larger LFM2.5-350M sibling, direct preference optimization, and multi-domain reinforcement learning. Liquid AI designed this to preserve flexibility—out-of-the-box capabilities balanced against downstream specialization room. The result punches above its weight class on benchmarks, competing with and often beating models more than twice its size across knowledge tasks (GPQA Diamond: 25.4, MMLU-Pro: 20.2), instruction following (IFEval: 71.7), data extraction (CaseReportBench: 22.5), and tool use (BFCLv3: 43.3).

Real-World Test: Humanoid Robot Deployment

As an early proof-of-concept, Liquid AI deployed LFM2.5-230M on a Unitree G1 humanoid robot running entirely on-device via its NVIDIA Jetson Orin module. Here the model functions as a skill-selection layer—taking natural language commands and decomposing them into structured tool calls that invoke pre-trained low-level skills from NVIDIA's SONIC framework. After a quick fine-tune, prompts like "Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters" get translated into multi-step plans chaining timed walking and one-legged kneel behaviors. Deliberately simple right now, but the signal is clear: a 230M model can serve as the natural-language control interface for hardware this sophisticated.

Inference Ecosystem Day-One Support

LFM2.5-230M ships with first-class support across the inference stack that matters: llama.cpp with GGUF checkpoints for edge deployment, MLX optimized for Apple Silicon, vLLM and SGLang for GPU-accelerated production serving, ONNX for cross-platform accelerator compatibility, and raw CPU inference. Flash attention tuning varies per device—enabled on Raspberry Pi 5, disabled on Snapdragon Gen4—to squeeze out maximum prefill performance.

Open-Weight Availability

Both the base (LFM2.5-230M-Base) and post-trained variants are live on Hugging Face today with no deployment restrictions. Liquid AI is positioning this as part of a broader vision: one architecture family spanning from lightweight edge models to specialized audio and vision variants, all sharing the same efficient LFM2 foundation.

Key Takeaways

  • 213 tok/s decode on Galaxy S25 Ultra; 42 tok/s on Raspberry Pi 5 CPU
  • Competes with 700M+ models on knowledge, instruction following, and tool use benchmarks
  • Deployed successfully as skill-selection layer on Unitree G1 humanoid robot via Jetson Orin
  • Open-weight release with day-one support across llama.cpp, MLX, vLLM, SGLang, ONNX

The Bottom Line

LFM2.5-230M isn't trying to replace cloud giants—it exists for the growing class of workloads that need real AI without the latency tax and privacy risks of sending data upstream. For hackers building local agents, robotics projects, or cost-sensitive production pipelines, this model deserves serious consideration. Edge inference just became a first-class citizen in the agentic workflow stack.