Hotel Chat Platform Builds Message Queue on Postgres After Kafka Head-of-Line Blocking Cripples AI Agents

Smartchat processes guest messages for more than 4,000 hotels, B&Bs, and vacation rentals across 95 languages. The system runs translation pipelines, AI agents, and channel adapters for Booking.com, WhatsApp, Airbnb, and email—with roughly 70 percent of guest queries receiving fully automated replies. When Kafka's head-of-line blocking started crippling their agent and translation workloads, the team at Smartness did what hackers do: they built their own solution on top of something they already knew. The result is Queen, a message broker that runs Postgres as its backend and today handles around 2 million messages per day across 100,000 active partitions.

Why Kafka Broke

The ordering guarantee in Kafka is per partition. If two conversations hash to the same partition and one stalls—say, waiting on an LLM translation call—the second conversation waits too. Smartchat's team knew this going in and sized their topics generously. But with thousands of properties growing fast, each with multiple concurrent guests, the natural unit of ordering isn't 'translation partition 47.' It's 'this specific conversation between this guest and this hotel, started two minutes ago.' Kafka wasn't designed for that shape. Neither were RabbitMQ, Redis Streams, or SQS—each had operational or semantic tradeoffs that didn't fit their workload.

The Three Rules That Made It Work

The first version was a single Postgres table with a partition column (conversation ID), an ordering column, a lease column, and stored procedures for push and pop. Moving the agent workload onto it over a weekend eliminated head-of-line blocking incidents immediately. But what made it actually scalable were three constraints the team held rigidly: no preallocation so partitions only exist when their first message arrives; lease-based delivery with watermarks per consumer group to keep slow consumers from affecting unrelated conversations; and atomic ack-and-push inside a single Postgres transaction, which turned out to be the foundation for everything that came later. 'If your queue gets the foundation right, the second layer pays for itself,' writes author Alice Viola.

100K Partitions Without Touching Them

Queen is written in C++ using libpq async for Postgres communication, libuv driving the event loop, and uWebSockets handling HTTP endpoints. In Smartchat's production cluster today, it manages approximately 35 queues, 39 consumer groups, and those 100,000 active partitions—not configured anywhere or provisioned ahead of time. A partition is simply a row that started existing when a guest first wrote. When the conversation ends and retention expires, rows disappear. The team doesn't think about partition count as an ops problem; they think about stable queues and let partitions be whatever the world is currently doing. Partition cleanup happens automatically: inactive partitions with no messages or consumer activity for seven days get dropped via ON DELETE CASCADE.

Postgres Bloat: The Honest Answer

Any Postgres table seeing heavy inserts and deletes accumulates dead tuples, and a queue is the canonical example of that workload. Smartchat's answer involves explicit per-queue TTLs (a hard ceiling plus a tighter window for messages consumed by all groups), batched deletes running every five minutes in 1,000-row chunks with advisory locks to prevent fleet-wide contention, and pre-filters that only touch partitions actually old enough to clean up. 'We do not do anything exotic,' Viola notes. 'No VACUUM FULL, no pg_repack.' If autovacuum is healthy on your Postgres for your hardware, you don't need anything special—queue workloads just surface mistuned vacuums faster.

The Streaming SDK That Fell Out For Free

That third constraint—atomic ack and push inside the broker's transaction—became the foundation for something unexpected. Smartchat shipped a streaming SDK supporting .map, .filter, .key_by, window_tumbling, window_sliding, window_session, .reduce, .aggregate, plus a .gate() operator for rate limiting with FIFO preserved on denial. Exactly-once semantics required almost no additional code: every cycle reads from the source, runs the operator chain, mutates state, emits to sinks, and acks the source in one Postgres transaction. If anything fails, everything rolls back and Queen redelivers via its existing lease mechanism. No two-phase commit, no Kafka transactions API wrapper, no external state store to synchronize. The SDK ships for JavaScript, Python, and Go—with operator chains hashing to identical SHA-256 across all three runtimes.

Key Takeaways

Your natural unit of ordering determines whether you need Kafka or something Postgres-based—if you have few high-volume stable streams, Kafka still wins
No preallocation means partitions only exist when needed and disappear when conversations end, eliminating operational overhead
Atomic ack-and-push inside a single transaction is the foundation that made stream processing almost free to add later
Stateless brokers on top of databases you already run buy real operational simplicity—no new cluster to monitor or back up

The Bottom Line

This is what happens when you actually understand your workload's shape instead of defaulting to whatever HN is hyping. Queen isn't a Kafka killer and doesn't try to be—it's purpose-built for environments where ordering matters at the conversation level, not the partition level. If you're running AI agents that talk to external LLMs with unpredictable latency, this architecture deserves a hard look before you spin up another managed service.

> Hotel Chat Platform Builds Message Queue on Postgres After Kafka Head-of-Line Blocking Cripples AI Agents