pgvector adds vector similarity search natively to PostgreSQL, enabling hybrid SQL and vector queries. These exercises cover adding vector columns, ivfflat vs. hnsw index differences, the three distance operators (<->, <=>, <#>), recommended nlist sizing, and query planner requirements for index usage.
0 / 5 completed
1 / 5
How do you add a vector column to an existing PostgreSQL table using the pgvector extension?
After enabling the extension with CREATE EXTENSION vector, you add a vector column using the vector(dimensions) type: ALTER TABLE items ADD COLUMN embedding vector(1536). The dimension count must match your embedding model's output (e.g., 1536 for text-embedding-ada-002, 3072 for text-embedding-3-large).
2 / 5
What is the difference between ivfflat and hnsw index types in pgvector?
ivfflat divides vectors into lists clusters and searches only probes clusters during queries — fast build time but recall depends on probe count. hnsw builds a hierarchical graph with better recall at query time and faster search but uses significantly more memory and has slower index build time.
3 / 5
A developer writes SELECT * FROM items ORDER BY embedding <=> $1 LIMIT 5. What does the <=> operator compute?
In pgvector, <=> is the cosine distance operator. For L2 distance use <->, and for negative inner product use <#>. The ORDER BY embedding <=> $1 LIMIT 5 pattern performs a K-nearest-neighbor search returning the 5 most semantically similar vectors to the query embedding $1.
4 / 5
When creating an ivfflat index in pgvector, what is the recommended number of lists for a table with 1 million rows?
pgvector documentation recommends setting lists to approximately sqrt(number of rows) for typical workloads. For 1 million rows, that's ~1000 lists. More lists = finer clustering = better recall at higher probes setting, but slower index build and higher memory usage.
5 / 5
What must you do before running a pgvector similarity search to take advantage of an hnsw index effectively?
PostgreSQL's query planner uses vector indexes (both hnsw and ivfflat) when the query uses ORDER BY embedding_column query_vector LIMIT k. Without a LIMIT, the planner may choose a sequential scan since retrieving all rows ordered doesn't benefit from approximate nearest neighbor indexes.