PostgreSQL Vocabulary: 30 Terms Every Developer Should Know
Learn essential PostgreSQL vocabulary — MVCC, WAL, VACUUM, index types, EXPLAIN ANALYZE, CTEs, window functions, replication, and 20+ more terms explained for developers.
PostgreSQL is one of the most powerful and widely deployed relational databases in the world. Knowing its vocabulary is essential for writing efficient queries, understanding performance issues, and having productive conversations with your DBA or backend colleagues. This guide covers 30 terms that every developer working with Postgres should know.
How Postgres Manages Data
MVCC (Multi-Version Concurrency Control)
MVCC is how Postgres handles concurrent reads and writes without locking. Instead of blocking readers when a row is being written, Postgres keeps multiple versions of each row. Readers see a consistent snapshot without waiting for writers.
“Postgres uses MVCC, so readers never block writers and writers never block readers."
"The reason VACUUM is important is that MVCC leaves old row versions behind — they accumulate if not cleaned up.”
WAL (Write-Ahead Log)
The WAL is a log of all changes made to the database, written before the actual data pages are modified. It is the foundation of crash recovery and replication. If Postgres crashes, it replays the WAL to restore a consistent state.
“Streaming replication works by shipping WAL segments from the primary to the replica."
"The WAL archive is also how we do point-in-time recovery.”
VACUUM and AUTOVACUUM
VACUUM is the process that reclaims space occupied by dead row versions (created by updates and deletes under MVCC). Without it, tables bloat and performance degrades. Autovacuum is the background daemon that runs VACUUM automatically.
“The table has significant bloat — VACUUM ANALYZE hasn’t run in a while. Check the autovacuum settings."
"Run VACUUM ANALYZE manually after a large bulk delete to reclaim space and update statistics immediately.”
Transaction
A transaction is a unit of work that is either fully committed or fully rolled back. Postgres is ACID-compliant — transactions are Atomic, Consistent, Isolated, and Durable.
“Wrap the batch insert in a transaction — if any row fails, we want to roll back the entire batch.”
Indexes
B-tree Index
The B-tree index is the default index type. It supports equality and range queries (=, <, >, BETWEEN, LIKE 'prefix%'). It is the right choice for most columns.
“The query is doing a sequential scan on a million-row table — add a B-tree index on the
created_atcolumn.”
GiST Index
GiST (Generalised Search Tree) supports complex data types and custom operators. It is used for full-text search, geometric data, IP ranges (inet), and more.
“We’re using a GiST index for the IP range lookups — B-tree doesn’t support range containment queries.”
GIN Index
GIN (Generalised Inverted Index) is optimised for indexing composite values where each item can appear in many rows — arrays, jsonb, and full-text search vectors. GIN is faster to query but slower to update than GiST.
“Add a GIN index on the
tagsarray column so we can efficiently query rows containing a specific tag.”
BRIN Index
BRIN (Block Range Index) stores a summary (min/max values) for each range of physical disk blocks. It is extremely small and suitable for large tables where data is naturally correlated with the physical order (e.g., time-series data written sequentially).
“For the audit log table — billions of rows ordered by timestamp — a BRIN index is much more space-efficient than B-tree.”
Query Performance
EXPLAIN and EXPLAIN ANALYZE
EXPLAIN shows the query plan — how Postgres intends to execute a query. EXPLAIN ANALYZE actually runs the query and shows real execution times and row counts alongside the estimates.
“The query is slow — run
EXPLAIN ANALYZEand look for large discrepancies between estimated and actual row counts."
"The plan shows a sequential scan on a large table. We need an index.”
Query Planner
The query planner (also called the optimiser) chooses the best execution plan for a query based on table statistics, index availability, and cost estimates.
“The planner is choosing a sequential scan even though an index exists — the statistics might be stale. Run ANALYZE.”
Sequential Scan (Seq Scan)
A sequential scan reads every row in the table. It is efficient for large result sets or small tables, but expensive when you need just a few rows from a large table.
“The EXPLAIN output shows a Seq Scan — that’s why the query takes 10 seconds. We need an index.”
Index Scan vs. Bitmap Index Scan
- Index Scan — follows the index to fetch rows one at a time. Fast for small result sets.
- Bitmap Index Scan — collects all matching row locations from the index, then fetches the actual rows in heap order. Efficient for medium-sized result sets or multiple index conditions combined with AND/OR.
Statistics
Postgres’s statistics are data about the distribution of values in each column. The query planner uses them to estimate how many rows a condition will match. They are updated by ANALYZE or AUTOVACUUM.
“The planner estimated 100 rows but got 50,000 — the statistics are stale. Run ANALYZE on the table.”
Advanced SQL Features
CTE (Common Table Expression / WITH)
A CTE defines a named subquery at the top of a statement using WITH. It makes complex queries more readable. In Postgres 12+, CTEs are inlined by the planner by default (unless you add MATERIALIZED).
“Use a CTE to break the query into logical steps — it’s much easier to read than nested subqueries.”
Window Function
A window function performs a calculation across a set of rows related to the current row, without collapsing them into a single group. Common window functions: ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER(PARTITION BY ...).
“Use
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC)to get the latest record per user."
"Window functions are one of Postgres’s most powerful features — learn them and you’ll write far fewer subqueries.”
jsonb
jsonb is Postgres’s binary JSON column type. It stores JSON in a parsed, binary format — faster to query than json (text). You can index jsonb fields with GIN indexes and query them with the -> and ->> operators.
“We store the product metadata as
jsonbso we don’t need to add a column for every possible attribute."
"Add a GIN index on themetadatajsonb column so you can efficiently search within the JSON.”
Operations and Administration
pg_stat_activity
pg_stat_activity is a system view that shows currently running queries, their state (active, idle, idle in transaction), and how long they have been running.
“Run
SELECT * FROM pg_stat_activity WHERE state = 'active'to see what’s currently running."
"There are sessions stuck inidle in transaction— they might be holding locks. Kill them if they’ve been there too long.”
Connection Pooling
Postgres has an overhead per connection (typically 5–10 MB of memory). Connection pooling reuses connections across many application threads. PgBouncer is the standard tool.
“We’re hitting the connection limit under load. Set up PgBouncer in transaction mode to pool connections."
"With PgBouncer, we went from 500 direct connections to 50 — the database is much happier.”
Replication
- Streaming replication — the primary streams WAL changes to replicas in real time. Used for read replicas and standby failover.
- Logical replication — replicates data at the SQL level, allowing selective table replication and cross-version replication.
“We have two streaming replicas for read scaling and one as a hot standby for failover."
"We used logical replication to migrate to a new Postgres version with zero downtime.”
psql
psql is the interactive command-line client for PostgreSQL. It is the standard tool for connecting to databases, running queries, and inspecting schemas.
“Connect with
psql -h host -U user -d databaseand run\dtto list all tables."
"Use\xin psql to switch to expanded output mode — much easier to read wide rows.”
Extension
A Postgres extension adds functionality to the database. Notable extensions: pg_stat_statements (query statistics), PostGIS (geospatial), uuid-ossp (UUID generation), pgcrypto (encryption).
“Install
pg_stat_statements— it’s essential for identifying slow queries."
"We use the PostGIS extension for geospatial queries — finding all users within 10 km of a location.”
Tablespace
A tablespace is a location on disk where Postgres stores data files. You can use tablespaces to place specific tables or indexes on faster or larger storage.
“Move the large archive table to the tablespace on cheaper slow storage — it’s rarely queried.”
Advisory Lock
An advisory lock is an application-level locking mechanism provided by Postgres. Unlike row locks, advisory locks are not tied to any specific data — your application defines their meaning.
“We use a Postgres advisory lock to prevent two workers from processing the same job simultaneously.”
How to Use This in Conversation
In performance investigation:
“Run
EXPLAIN ANALYZEon the slow query and share the output. I want to see if the planner is choosing the right index.”
In code review:
“This query doesn’t use the index on
user_idbecause of theLOWER()function call. Create a functional index instead.”
In architecture discussion:
“We need read scaling — let’s add a streaming replica and route read-heavy queries to it.”
When explaining a Postgres behaviour:
“MVCC means reading doesn’t block writing, but it creates dead tuples. AUTOVACUUM cleans them up — if it can’t keep up, you’ll see table bloat.”
PostgreSQL rewards developers who invest in understanding it deeply. This vocabulary is your foundation for writing better queries, diagnosing performance issues, and contributing meaningfully to database design discussions.