RAG with Built-in Embeddings

DB9 includes a built-in embedding() function and HNSW vector indexes. You can build a complete RAG retrieval pipeline inside the database without managing embedding API keys in your application — the server handles the embedding API calls.

This guide walks through the full flow: create a table, embed documents, build an index, and query by semantic similarity.

Prerequisites

A DB9 database (see Quick Start)
The embedding and vector extensions enabled

CREATE EXTENSION IF NOT EXISTS embedding;
CREATE EXTENSION IF NOT EXISTS vector;

1. Create a Documents Table

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1024)
);

The embedding() function returns 1024-dimensional vectors by default, so the column uses vector(1024).

2. Insert Documents with Embeddings

Use embedding() inline to generate vectors at insert time:

INSERT INTO documents (title, content, embedding) VALUES
  ('PostgreSQL basics',
   'PostgreSQL is an open-source relational database with strong SQL compliance.',
   embedding('PostgreSQL is an open-source relational database with strong SQL compliance.')),

  ('Vector search',
   'Vector search finds semantically similar items using distance metrics like cosine similarity.',
   embedding('Vector search finds semantically similar items using distance metrics like cosine similarity.')),

  ('DB9 overview',
   'DB9 is a serverless PostgreSQL-compatible database with built-in embeddings, file storage, and HTTP from SQL.',
   embedding('DB9 is a serverless PostgreSQL-compatible database with built-in embeddings, file storage, and HTTP from SQL.'));

Each embedding() call sends the text to the configured embedding model and returns a vector(1024). You can also embed a different string than the stored content — for example, a concatenation of title and content:

INSERT INTO documents (title, content, embedding) VALUES
  ('My document',
   'Full document body here...',
   embedding('My document: Full document body here...'));

3. Build an HNSW Index

For fast approximate nearest-neighbor search, create an HNSW index:

CREATE INDEX idx_documents_embedding
  ON documents USING hnsw (embedding vector_cosine_ops);

The operator class determines the distance metric used by the index:

Operator class	Distance metric	Best for
`vector_cosine_ops`	Cosine distance (`<=>`)	Semantic similarity (recommended default)
`vector_l2_ops`	Euclidean distance (`<->`)	Absolute distance
`vector_ip_ops`	Negative inner product (`<#>`)	Pre-normalized vectors

You can tune index build parameters for your dataset size:

CREATE INDEX idx_documents_embedding
  ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 128);

4. Query by Semantic Similarity

Find the most relevant documents for a natural-language query:

SELECT id, title, embedding <=> embedding('How does vector search work?') AS distance
FROM documents
ORDER BY embedding <=> embedding('How does vector search work?')
LIMIT 5;

The <=> operator computes cosine distance. Lower values mean higher similarity.

Tune search accuracy

The hnsw.ef_search parameter controls the accuracy-speed tradeoff at query time:

-- Higher = more accurate, slower (default: 40)
SET hnsw.ef_search = 100;

SELECT id, title
FROM documents
ORDER BY embedding <=> embedding('serverless database')
LIMIT 5;

Auto-embed queries with VEC_EMBED functions

Instead of calling embedding() explicitly, you can use helper functions that embed the query text automatically:

SELECT id, title
FROM documents
ORDER BY VEC_EMBED_COSINE_DISTANCE(embedding, 'How does vector search work?')
LIMIT 5;

Available auto-embed distance functions:

Function	Metric
`VEC_EMBED_COSINE_DISTANCE(vector_col, text)`	Cosine
`VEC_EMBED_L2_DISTANCE(vector_col, text)`	Euclidean
`VEC_EMBED_INNER_PRODUCT(vector_col, text)`	Inner product

5. Use the Results in a RAG Pipeline

A typical RAG flow:

User asks a question — your application receives a natural-language query
Retrieve context — run the similarity query above to get the top-k documents
Build prompt — concatenate the retrieved document content into an LLM prompt
Generate answer — send the prompt to your LLM (Claude, GPT, etc.)

The retrieval step is a single SQL query. Here is a complete retrieval query that returns the context your LLM needs:

SELECT title, content
FROM documents
ORDER BY embedding <=> embedding('What is DB9?')
LIMIT 3;

Hybrid retrieval: vector + full-text search

Combine vector similarity with PostgreSQL full-text search for better recall:

-- Add a tsvector column for FTS
ALTER TABLE documents ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;

CREATE INDEX idx_documents_fts ON documents USING gin(tsv);

-- Hybrid query: vector similarity + keyword match
SELECT id, title,
       embedding <=> embedding('serverless database') AS vec_distance,
       ts_rank(tsv, plainto_tsquery('english', 'serverless database')) AS fts_rank
FROM documents
WHERE tsv @@ plainto_tsquery('english', 'serverless database')
ORDER BY embedding <=> embedding('serverless database')
LIMIT 5;

embedding() Function Reference

-- Default: text-embedding-v4 model, 1024 dimensions
embedding('your text')

-- Explicit model and dimensions
embedding('your text', 'text-embedding-v4', 1024)

Parameter	Type	Default	Description
`text`	TEXT	(required)	Input text to embed. Cannot be empty.
`model`	TEXT	`text-embedding-v4`	Embedding model name.
`dimensions`	INT	1024	Output vector dimensions.

The function returns NULL for NULL input. Empty strings are rejected with an error.

Check token usage

Monitor your embedding API consumption:

SELECT * FROM extensions.embedding_usage();

Returns tokens_used (daily count) and resets_at (next midnight UTC).

Distance Operators and Functions

Operator	Function	Metric
`<=>`	`cosine_distance(a, b)`	Cosine distance (0 = identical)
`<->`	`l2_distance(a, b)`	Euclidean distance
`<#>`	`inner_product(a, b)`	Negative dot product

Utility functions:

SELECT vector_dims(embedding) FROM documents LIMIT 1;  -- 1024
SELECT vector_norm(embedding) FROM documents LIMIT 1;   -- L2 magnitude
SELECT l2_normalize(embedding) FROM documents LIMIT 1;  -- unit vector

Current Limitations

Model selection — the default provider supports text-embedding-v4. Custom models require server-side configuration.
Superuser only — embedding() requires the database admin role. Non-admin users receive a permission error.
Daily token budget — embedding calls consume tokens from a daily quota. Check usage with extensions.embedding_usage().
Dimensions must match — the vector column width must match the dimensions returned by the model. The default is 1024.

Vector Search Extension — HNSW configuration, operator reference, and vector arithmetic
Full-Text Search — GIN indexes, ranking, and language support
Extensions Overview — all 9 built-in extensions
Agent Workflows — using RAG in agent pipelines
CLI Reference — db9 db sql for running queries from the terminal