You already know how to generate embeddings. Now the question is: where do you put them? If your application already uses PostgreSQL, the answer is surprisingly simple -- keep them right next to your relational data with pgvector, a PostgreSQL extension that adds vector storage, distance operators, and approximate nearest-neighbor indexing directly to your existing database.
Coming from Software Engineering? pgvector is like adding a spatial index to PostGIS, but for meaning-space instead of geo-space. Just as PostGIS lets you run
ST_Distancequeries on geographic coordinates alongside your regular SQL, pgvector lets you run similarity searches on embedding vectors alongside JOINs, WHERE clauses, and transactions you already know.
Why pgvector?
Key advantages:
- Single source of truth -- embeddings live in the same row as the data they represent. No sync issues.
- ACID transactions -- embedding inserts participate in the same transaction as your relational writes.
- Familiar tooling -- use psycopg2, SQLAlchemy, Alembic migrations,
pg_dumpbackups. Nothing new to learn. - SQL power -- combine vector similarity with WHERE filters, JOINs, GROUP BY, and full-text search in a single query.
- Cost -- no additional managed service bill. Your existing PostgreSQL instance handles it.
Setting Up pgvector
Installation
# Ubuntu / Debian
sudo apt install postgresql-16-pgvector
# macOS (Homebrew)
brew install pgvector
# Docker (easiest for local dev)
docker run --name pgvector-demo \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
-d pgvector/pgvector:pg16
Enable the Extension
-- Connect to your database and enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify it's installed
SELECT extversion FROM pg_extension WHERE extname = 'vector';
Creating Tables with Vector Columns
The vector type takes a dimension argument. Match it to your embedding model's output size.
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
category TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
embedding vector(1536) -- OpenAI text-embedding-3-small dimensions
);
You can add a vector column to an existing table just as easily:
ALTER TABLE products ADD COLUMN embedding vector(1536);
Storing Embeddings
Generate embeddings with the OpenAI API and insert them using psycopg2.
Single Document
# script_id: day_022_pgvector/pgvector_crud_operations
import psycopg2
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str) -> list[float]:
"""Generate an embedding from OpenAI."""
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Connect to PostgreSQL
conn = psycopg2.connect(
host="localhost",
dbname="myapp",
user="postgres",
password="secret"
)
cur = conn.cursor()
# Insert a document with its embedding
title = "Introduction to Neural Networks"
content = "Neural networks are computing systems inspired by biological neural networks..."
embedding = get_embedding(content)
cur.execute(
"""
INSERT INTO documents (title, content, category, embedding)
VALUES (%s, %s, %s, %s::vector)
""",
(title, content, "machine-learning", embedding)
)
conn.commit()
print("Document stored with embedding.")
Batch Insert
# script_id: day_022_pgvector/pgvector_crud_operations
import psycopg2.extras
def store_documents_batch(
cur,
documents: list[dict],
batch_size: int = 50
):
"""Store multiple documents with embeddings efficiently."""
for i in range(0, len(documents), batch_size):
batch = documents[i:i + batch_size]
# Generate embeddings for the batch
texts = [doc["content"] for doc in batch]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [d.embedding for d in sorted(response.data, key=lambda x: x.index)]
# Bulk insert with execute_values
rows = [
(doc["title"], doc["content"], doc["category"], emb)
for doc, emb in zip(batch, embeddings)
]
psycopg2.extras.execute_values(
cur,
"""
INSERT INTO documents (title, content, category, embedding)
VALUES %s
""",
rows,
template="(%s, %s, %s, %s::vector)"
)
print(f"Inserted {min(i + batch_size, len(documents))}/{len(documents)}")
# Usage
docs = [
{"title": "Doc 1", "content": "About transformers...", "category": "ml"},
{"title": "Doc 2", "content": "About databases...", "category": "infra"},
# ... more documents
]
store_documents_batch(cur, docs)
conn.commit()
Distance Operators
pgvector provides three distance operators. Choosing the right one matters.
In SQL
-- Cosine distance (most common for text embeddings)
SELECT title, embedding <=> query_embedding AS distance
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5;
-- L2 (Euclidean) distance
SELECT title, embedding <-> query_embedding AS distance
FROM documents
ORDER BY embedding <-> query_embedding
LIMIT 5;
-- Negative inner product
SELECT title, embedding <#> query_embedding AS distance
FROM documents
ORDER BY embedding <#> query_embedding
LIMIT 5;
Tip: OpenAI embeddings are normalized, so cosine distance (
<=>) and negative inner product (<#>) produce equivalent rankings. Cosine distance is the safer default because it works correctly whether or not vectors are normalized.
Querying Nearest Neighbors
This is the core use case: "given a query, find the most similar documents."
# script_id: day_022_pgvector/pgvector_crud_operations
def search_documents(
cur,
query: str,
category: str | None = None,
limit: int = 5
) -> list[dict]:
"""Find documents most similar to a query string."""
query_embedding = get_embedding(query)
if category:
cur.execute(
"""
SELECT id, title, content, embedding <=> %s::vector AS distance
FROM documents
WHERE category = %s
ORDER BY embedding <=> %s::vector
LIMIT %s
""",
(query_embedding, category, query_embedding, limit)
)
else:
cur.execute(
"""
SELECT id, title, content, embedding <=> %s::vector AS distance
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""",
(query_embedding, query_embedding, limit)
)
results = []
for row in cur.fetchall():
results.append({
"id": row[0],
"title": row[1],
"content": row[2],
"distance": row[3]
})
return results
# Usage
results = search_documents(cur, "How do neural networks learn?")
for r in results:
print(f"{r['title']} (distance: {r['distance']:.4f})")
The pattern is always the same: embed the query, ORDER BY distance, LIMIT N.
Indexing: IVFFlat vs HNSW
Without an index, pgvector performs an exact (brute-force) scan of every row. That is fine for thousands of rows, but slows down at scale. pgvector offers two approximate nearest-neighbor (ANN) index types.
IVFFlat
Partitions vectors into lists (clusters) and searches only a subset at query time.
-- Create IVFFlat index (requires data in the table first)
-- lists = number of clusters; rule of thumb: sqrt(num_rows)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- At query time, control how many lists to search (higher = more accurate, slower)
SET ivfflat.probes = 10;
HNSW
Builds a multi-layer graph for fast traversal. Generally the better choice.
-- Create HNSW index (can be created on an empty table)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- At query time, control search breadth (higher = more accurate, slower)
SET hnsw.ef_search = 40;
Which Should You Pick?
| Factor | IVFFlat | HNSW |
|---|---|---|
| Build speed | Faster | Slower |
| Query speed | Good | Better |
| Recall accuracy | Good (tune probes) | Better |
| Memory usage | Lower | Higher |
| Works on empty table | No (needs training data) | Yes |
| Insert performance | Fast | Slower (updates graph) |
Default recommendation: Use HNSW unless you have a very large, mostly-static dataset and need to minimize memory usage.
Hybrid Search: Vectors + Full-Text
One of pgvector's biggest advantages is combining semantic similarity with PostgreSQL's built-in full-text search (tsvector). This gives you the best of both worlds: keyword precision and semantic understanding.
-- Add a tsvector column for full-text search
ALTER TABLE documents ADD COLUMN tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;
CREATE INDEX ON documents USING gin(tsv);
# script_id: day_022_pgvector/pgvector_crud_operations
def hybrid_search(
cur,
query: str,
limit: int = 5,
semantic_weight: float = 0.7
) -> list[dict]:
"""
Combine semantic search (pgvector) with full-text search (tsvector).
Scores are normalized and blended using the given weight.
"""
query_embedding = get_embedding(query)
cur.execute(
"""
WITH semantic AS (
SELECT id, title, content,
1 - (embedding <=> %s::vector) AS semantic_score
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
),
fulltext AS (
SELECT id,
ts_rank(tsv, plainto_tsquery('english', %s)) AS text_score
FROM documents
WHERE tsv @@ plainto_tsquery('english', %s)
)
SELECT s.id, s.title, s.content,
(%s * s.semantic_score +
%s * COALESCE(f.text_score, 0)) AS combined_score
FROM semantic s
LEFT JOIN fulltext f ON s.id = f.id
ORDER BY combined_score DESC
LIMIT %s
""",
(
query_embedding, query_embedding, limit * 2,
query, query,
semantic_weight, 1 - semantic_weight,
limit
)
)
results = []
for row in cur.fetchall():
results.append({
"id": row[0],
"title": row[1],
"content": row[2],
"score": row[3]
})
return results
# Usage
results = hybrid_search(cur, "transformer attention mechanism")
for r in results:
print(f"{r['title']} (score: {r['score']:.4f})")
Hybrid search helps when:
- A user searches for a specific term (e.g., an error code) that semantic search alone might miss.
- You want semantic understanding but also need to boost exact keyword matches.
When to Consider Alternatives
pgvector is excellent when you already use PostgreSQL. But it is not the only option.
ChromaDB is a lightweight, in-process vector database that is great for prototyping and small-scale experiments:
# script_id: day_022_pgvector/chromadb_alternative
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
# Add documents (ChromaDB can auto-embed with built-in models)
collection.add(
ids=["1", "2"],
documents=["About neural networks...", "About databases..."]
)
# Query
results = collection.query(query_texts=["How do networks learn?"], n_results=2)
print(results["documents"])
Use ChromaDB when you need a quick prototype or are working in a notebook. Move to pgvector when you need transactions, existing relational data, or production durability.
Summary
Quick Reference
# script_id: day_022_pgvector/quick_reference
import psycopg2
from openai import OpenAI
client = OpenAI()
conn = psycopg2.connect(host="localhost", dbname="myapp", user="postgres", password="secret")
cur = conn.cursor()
# --- Setup ---
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
cur.execute("""
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1536)
);
""")
# --- Store ---
text = "Some document text"
emb = client.embeddings.create(model="text-embedding-3-small", input=text).data[0].embedding
cur.execute("INSERT INTO documents (content, embedding) VALUES (%s, %s::vector)", (text, emb))
conn.commit()
# --- Search ---
query_emb = client.embeddings.create(model="text-embedding-3-small", input="my query").data[0].embedding
cur.execute("""
SELECT id, content, embedding <=> %s::vector AS distance
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT 5
""", (query_emb, query_emb))
results = cur.fetchall()
# --- Index ---
cur.execute("""
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
""")
conn.commit()
Exercises
-
Build a Knowledge Base: Create a PostgreSQL table with pgvector, load 50+ text passages from a topic of your choice, and build a Python function that takes a natural-language question and returns the top 3 most relevant passages with their cosine distances.
-
Index Benchmark: Insert 10,000 randomly generated 1536-dimension vectors. Measure query latency (a) with no index, (b) with IVFFlat, and (c) with HNSW. Plot the results and note the recall-vs-speed tradeoff as you vary
probesandef_search. -
Hybrid Search Pipeline: Extend your knowledge base with a
tsvectorcolumn and implement the hybrid search function from this tutorial. Compare results for queries that contain specific technical terms versus vague conceptual questions. When does hybrid search outperform pure semantic search?
What's Next?
Now that you can store embeddings, let's learn to index, query, and search them effectively!