Skip to content
Discord Get Started

RAG with Built-in Embeddings

DB9 includes a built-in embedding() function and HNSW vector indexes. You can build a complete RAG retrieval pipeline inside the database without managing embedding API keys in your application — the server handles the embedding API calls.

This guide walks through the full flow: create a table, embed documents, build an index, and query by semantic similarity.

  • A DB9 database (see Quick Start)
  • The embedding and vector extensions enabled
SQL
CREATE EXTENSION IF NOT EXISTS embedding;
CREATE EXTENSION IF NOT EXISTS vector;
SQL
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1024)
);

The embedding() function returns 1024-dimensional vectors by default, so the column uses vector(1024).

Use embedding() inline to generate vectors at insert time:

SQL
INSERT INTO documents (title, content, embedding) VALUES
('PostgreSQL basics',
'PostgreSQL is an open-source relational database with strong SQL compliance.',
embedding('PostgreSQL is an open-source relational database with strong SQL compliance.')),
('Vector search',
'Vector search finds semantically similar items using distance metrics like cosine similarity.',
embedding('Vector search finds semantically similar items using distance metrics like cosine similarity.')),
('DB9 overview',
'DB9 is a serverless PostgreSQL-compatible database with built-in embeddings, file storage, and HTTP from SQL.',
embedding('DB9 is a serverless PostgreSQL-compatible database with built-in embeddings, file storage, and HTTP from SQL.'));

Each embedding() call sends the text to the configured embedding model and returns a vector(1024). You can also embed a different string than the stored content — for example, a concatenation of title and content:

SQL
INSERT INTO documents (title, content, embedding) VALUES
('My document',
'Full document body here...',
embedding('My document: Full document body here...'));

For fast approximate nearest-neighbor search, create an HNSW index:

SQL
CREATE INDEX idx_documents_embedding
ON documents USING hnsw (embedding vector_cosine_ops);

The operator class determines the distance metric used by the index:

Operator classDistance metricBest for
vector_cosine_opsCosine distance (<=>)Semantic similarity (recommended default)
vector_l2_opsEuclidean distance (<->)Absolute distance
vector_ip_opsNegative inner product (<#>)Pre-normalized vectors

You can tune index build parameters for your dataset size:

SQL
CREATE INDEX idx_documents_embedding
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 32, ef_construction = 128);

Find the most relevant documents for a natural-language query:

SQL
SELECT id, title, embedding <=> embedding('How does vector search work?') AS distance
FROM documents
ORDER BY embedding <=> embedding('How does vector search work?')
LIMIT 5;

The <=> operator computes cosine distance. Lower values mean higher similarity.

The hnsw.ef_search parameter controls the accuracy-speed tradeoff at query time:

SQL
-- Higher = more accurate, slower (default: 40)
SET hnsw.ef_search = 100;
SELECT id, title
FROM documents
ORDER BY embedding <=> embedding('serverless database')
LIMIT 5;

Auto-embed queries with VEC_EMBED functions

Section titled “Auto-embed queries with VEC_EMBED functions”

Instead of calling embedding() explicitly, you can use helper functions that embed the query text automatically:

SQL
SELECT id, title
FROM documents
ORDER BY VEC_EMBED_COSINE_DISTANCE(embedding, 'How does vector search work?')
LIMIT 5;

Available auto-embed distance functions:

FunctionMetric
VEC_EMBED_COSINE_DISTANCE(vector_col, text)Cosine
VEC_EMBED_L2_DISTANCE(vector_col, text)Euclidean
VEC_EMBED_INNER_PRODUCT(vector_col, text)Inner product

A typical RAG flow:

  1. User asks a question — your application receives a natural-language query
  2. Retrieve context — run the similarity query above to get the top-k documents
  3. Build prompt — concatenate the retrieved document content into an LLM prompt
  4. Generate answer — send the prompt to your LLM (Claude, GPT, etc.)

The retrieval step is a single SQL query. Here is a complete retrieval query that returns the context your LLM needs:

SQL
SELECT title, content
FROM documents
ORDER BY embedding <=> embedding('What is DB9?')
LIMIT 3;
Section titled “Hybrid retrieval: vector + full-text search”

Combine vector similarity with PostgreSQL full-text search for better recall:

SQL
-- Add a tsvector column for FTS
ALTER TABLE documents ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;
CREATE INDEX idx_documents_fts ON documents USING gin(tsv);
-- Hybrid query: vector similarity + keyword match
SELECT id, title,
embedding <=> embedding('serverless database') AS vec_distance,
ts_rank(tsv, plainto_tsquery('english', 'serverless database')) AS fts_rank
FROM documents
WHERE tsv @@ plainto_tsquery('english', 'serverless database')
ORDER BY embedding <=> embedding('serverless database')
LIMIT 5;
SQL
-- Default: text-embedding-v4 model, 1024 dimensions
embedding('your text')
-- Explicit model and dimensions
embedding('your text', 'text-embedding-v4', 1024)
ParameterTypeDefaultDescription
textTEXT(required)Input text to embed. Cannot be empty.
modelTEXTtext-embedding-v4Embedding model name.
dimensionsINT1024Output vector dimensions.

The function returns NULL for NULL input. Empty strings are rejected with an error.

Monitor your embedding API consumption:

SQL
SELECT * FROM extensions.embedding_usage();

Returns tokens_used (daily count) and resets_at (next midnight UTC).

OperatorFunctionMetric
<=>cosine_distance(a, b)Cosine distance (0 = identical)
<->l2_distance(a, b)Euclidean distance
<#>inner_product(a, b)Negative dot product

Utility functions:

SQL
SELECT vector_dims(embedding) FROM documents LIMIT 1; -- 1024
SELECT vector_norm(embedding) FROM documents LIMIT 1; -- L2 magnitude
SELECT l2_normalize(embedding) FROM documents LIMIT 1; -- unit vector
  • Model selection — the default provider supports text-embedding-v4. Custom models require server-side configuration.
  • Superuser onlyembedding() requires the database admin role. Non-admin users receive a permission error.
  • Daily token budget — embedding calls consume tokens from a daily quota. Check usage with extensions.embedding_usage().
  • Dimensions must match — the vector column width must match the dimensions returned by the model. The default is 1024.