RAG with Built-in Embeddings
DB9 includes a built-in embedding() function and HNSW vector indexes. You can build a complete RAG retrieval pipeline inside the database without managing embedding API keys in your application — the server handles the embedding API calls.
This guide walks through the full flow: create a table, embed documents, build an index, and query by semantic similarity.
Prerequisites
Section titled “Prerequisites”- A DB9 database (see Quick Start)
- The
embeddingandvectorextensions enabled
CREATE EXTENSION IF NOT EXISTS embedding;CREATE EXTENSION IF NOT EXISTS vector;1. Create a Documents Table
Section titled “1. Create a Documents Table”CREATE TABLE documents ( id SERIAL PRIMARY KEY, title TEXT NOT NULL, content TEXT NOT NULL, embedding vector(1024));The embedding() function returns 1024-dimensional vectors by default, so the column uses vector(1024).
2. Insert Documents with Embeddings
Section titled “2. Insert Documents with Embeddings”Use embedding() inline to generate vectors at insert time:
INSERT INTO documents (title, content, embedding) VALUES ('PostgreSQL basics', 'PostgreSQL is an open-source relational database with strong SQL compliance.', embedding('PostgreSQL is an open-source relational database with strong SQL compliance.')),
('Vector search', 'Vector search finds semantically similar items using distance metrics like cosine similarity.', embedding('Vector search finds semantically similar items using distance metrics like cosine similarity.')),
('DB9 overview', 'DB9 is a serverless PostgreSQL-compatible database with built-in embeddings, file storage, and HTTP from SQL.', embedding('DB9 is a serverless PostgreSQL-compatible database with built-in embeddings, file storage, and HTTP from SQL.'));Each embedding() call sends the text to the configured embedding model and returns a vector(1024). You can also embed a different string than the stored content — for example, a concatenation of title and content:
INSERT INTO documents (title, content, embedding) VALUES ('My document', 'Full document body here...', embedding('My document: Full document body here...'));3. Build an HNSW Index
Section titled “3. Build an HNSW Index”For fast approximate nearest-neighbor search, create an HNSW index:
CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_cosine_ops);The operator class determines the distance metric used by the index:
| Operator class | Distance metric | Best for |
|---|---|---|
vector_cosine_ops | Cosine distance (<=>) | Semantic similarity (recommended default) |
vector_l2_ops | Euclidean distance (<->) | Absolute distance |
vector_ip_ops | Negative inner product (<#>) | Pre-normalized vectors |
You can tune index build parameters for your dataset size:
CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128);4. Query by Semantic Similarity
Section titled “4. Query by Semantic Similarity”Find the most relevant documents for a natural-language query:
SELECT id, title, embedding <=> embedding('How does vector search work?') AS distanceFROM documentsORDER BY embedding <=> embedding('How does vector search work?')LIMIT 5;The <=> operator computes cosine distance. Lower values mean higher similarity.
Tune search accuracy
Section titled “Tune search accuracy”The hnsw.ef_search parameter controls the accuracy-speed tradeoff at query time:
-- Higher = more accurate, slower (default: 40)SET hnsw.ef_search = 100;
SELECT id, titleFROM documentsORDER BY embedding <=> embedding('serverless database')LIMIT 5;Auto-embed queries with VEC_EMBED functions
Section titled “Auto-embed queries with VEC_EMBED functions”Instead of calling embedding() explicitly, you can use helper functions that embed the query text automatically:
SELECT id, titleFROM documentsORDER BY VEC_EMBED_COSINE_DISTANCE(embedding, 'How does vector search work?')LIMIT 5;Available auto-embed distance functions:
| Function | Metric |
|---|---|
VEC_EMBED_COSINE_DISTANCE(vector_col, text) | Cosine |
VEC_EMBED_L2_DISTANCE(vector_col, text) | Euclidean |
VEC_EMBED_INNER_PRODUCT(vector_col, text) | Inner product |
5. Use the Results in a RAG Pipeline
Section titled “5. Use the Results in a RAG Pipeline”A typical RAG flow:
- User asks a question — your application receives a natural-language query
- Retrieve context — run the similarity query above to get the top-k documents
- Build prompt — concatenate the retrieved document content into an LLM prompt
- Generate answer — send the prompt to your LLM (Claude, GPT, etc.)
The retrieval step is a single SQL query. Here is a complete retrieval query that returns the context your LLM needs:
SELECT title, contentFROM documentsORDER BY embedding <=> embedding('What is DB9?')LIMIT 3;Hybrid retrieval: vector + full-text search
Section titled “Hybrid retrieval: vector + full-text search”Combine vector similarity with PostgreSQL full-text search for better recall:
-- Add a tsvector column for FTSALTER TABLE documents ADD COLUMN tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;
CREATE INDEX idx_documents_fts ON documents USING gin(tsv);
-- Hybrid query: vector similarity + keyword matchSELECT id, title, embedding <=> embedding('serverless database') AS vec_distance, ts_rank(tsv, plainto_tsquery('english', 'serverless database')) AS fts_rankFROM documentsWHERE tsv @@ plainto_tsquery('english', 'serverless database')ORDER BY embedding <=> embedding('serverless database')LIMIT 5;embedding() Function Reference
Section titled “embedding() Function Reference”-- Default: text-embedding-v4 model, 1024 dimensionsembedding('your text')
-- Explicit model and dimensionsembedding('your text', 'text-embedding-v4', 1024)| Parameter | Type | Default | Description |
|---|---|---|---|
text | TEXT | (required) | Input text to embed. Cannot be empty. |
model | TEXT | text-embedding-v4 | Embedding model name. |
dimensions | INT | 1024 | Output vector dimensions. |
The function returns NULL for NULL input. Empty strings are rejected with an error.
Check token usage
Section titled “Check token usage”Monitor your embedding API consumption:
SELECT * FROM extensions.embedding_usage();Returns tokens_used (daily count) and resets_at (next midnight UTC).
Distance Operators and Functions
Section titled “Distance Operators and Functions”| Operator | Function | Metric |
|---|---|---|
<=> | cosine_distance(a, b) | Cosine distance (0 = identical) |
<-> | l2_distance(a, b) | Euclidean distance |
<#> | inner_product(a, b) | Negative dot product |
Utility functions:
SELECT vector_dims(embedding) FROM documents LIMIT 1; -- 1024SELECT vector_norm(embedding) FROM documents LIMIT 1; -- L2 magnitudeSELECT l2_normalize(embedding) FROM documents LIMIT 1; -- unit vectorCurrent Limitations
Section titled “Current Limitations”- Model selection — the default provider supports
text-embedding-v4. Custom models require server-side configuration. - Superuser only —
embedding()requires the database admin role. Non-admin users receive a permission error. - Daily token budget — embedding calls consume tokens from a daily quota. Check usage with
extensions.embedding_usage(). - Dimensions must match — the vector column width must match the dimensions returned by the model. The default is 1024.
Next Pages
Section titled “Next Pages”- Vector Search Extension — HNSW configuration, operator reference, and vector arithmetic
- Full-Text Search — GIN indexes, ranking, and language support
- Extensions Overview — all 9 built-in extensions
- Agent Workflows — using RAG in agent pipelines
- CLI Reference —
db9 db sqlfor running queries from the terminal