When developers start talking about "adding RAG" to their systems, the conversation immediately veers toward which vector database to deploy. Should it be Pinecone? Weaviate? Qdrant? Those are valid architectural questions—but they usually aren't the first question worth asking, according to developer rkrisa writing on DEV.to.

The Real First Question

Before picking infrastructure, teams need to answer something simpler: what approved knowledge should this workflow actually retrieve before an AI decision happens? In operations-heavy systems, models shouldn't be answering from raw memory or a giant prompt dump. The useful context already exists elsewhere—approved response rules, handoff criteria, product notes, campaign guidance, decisions that humans already made. The hard part isn't generating fluent text. It's retrieving the right approved context and refusing when no safe source exists.

Why Postgres First

For this type of workflow, most surrounding data is already relational: leads or conversations, workflow names, stages and owners, human review outcomes, source metadata, trace logs, document versions. So the first technical choice isn't "where do vectors live in the abstract?" It's threefold: where can retrieval stay close to the operational data model, where can you log the retrieval path and final decision together, and where can schema evolution happen without spinning up a second system prematurely? Postgres plus pgvector addresses all three. It consolidates documents, chunks, metadata like allowed use and approval requirements, retrieval traces, cost estimates, and human review outcomes in one place—no distributed systems tax required.

What the First Version Actually Needs

The initial version doesn't need to be broad. It needs to be inspectable. A narrow retrieval scope should cover approved response rules, product or service notes, handoff and escalation criteria, campaign or source guidance, and commercial playbooks. Each retrieved chunk should carry more than text—it should include metadata such as source name, document version, business domain, allowed use flags, and whether human approval is required before that context influences a customer-facing response.

The Eval Mindset Matters More Than the Stack

A retrieval layer isn't real until it has failure criteria defined. Before declaring victory with embeddings deployed, teams need to test against a golden-question set: Does the expected source appear in top results? Does the workflow return no-answer or trigger handoff when the source is missing entirely? Does customer-facing language come only from allowed chunks? Can you audit which chunks influenced any given decision later? Without these checks, a RAG layer can look sophisticated while still pulling completely wrong context. That's not an AI problem—that's a retrieval problem hiding behind LLM confidence.

Observability Is Part of the Design

The retrieval step and the AI decision step need to be traceable together in the same review surface. Teams should be able to see retrieved chunk IDs, similarity or retrieval scores, model name, token and cost estimates, final decision output, handoff reason if applicable, and human review outcomes. This is what separates "the system answered" from "the system answered for a defensible reason." Without this audit trail, you're flying blind in production.

When to Add Standalone Vector Infrastructure

This isn't an argument against standalone vector databases—it's an argument against reaching for them prematurely. Later phases might justify separate infrastructure if the system requires more search traffic, complex filtering boundaries across deployment boundaries, or recall and latency characteristics that outweigh the operational complexity of additional moving parts. But before that threshold, a smaller stack makes retrieval, evaluation, and auditability genuinely visible rather than scattered across distributed systems.

Key Takeaways

  • Start with Postgres + pgvector when your operational data is already relational—keep vectors near their context
  • Define narrow, inspectable retrieval scopes first; scope can expand but only after foundations are auditable
  • Every chunk should carry metadata about allowed use and approval requirements before it touches customer-facing output
  • Golden-question evaluation sets matter more than announcing that the system has embeddings

The Bottom Line

The RAG infrastructure obsession misses what actually matters in operational AI: control, reviewability, and schema clarity. Postgres first isn't a compromise—it's the pragmatic choice for teams that need to answer "why did the AI say that?" with evidence, not guesswork.