Data Modelling Vocabulary: ER Diagrams, Normalisation, and Schema Design
Master data modelling English vocabulary: entities, relationships, normalisation, foreign keys, indexes, and dbt model types including staging, mart, and snapshot layers.
Data engineers, data analysts, and backend developers who work with databases need precise vocabulary to discuss schema design, model structure, and data relationships. Whether you are drawing an entity-relationship diagram, reviewing a dbt project, or discussing normalisation trade-offs with a colleague, the right vocabulary makes the conversation faster and more accurate.
Entity-Relationship (ER) Diagram Vocabulary
Entity — a real-world object or concept represented as a table in a relational database. Entities are typically nouns: Customer, Order, Product, Invoice. “The schema has three core entities: Customer, Order, and OrderLine.”
Attribute — a property or characteristic of an entity, corresponding to a column in a table. “The Customer entity has attributes: customer_id, email, created_at, and country_code.”
Relationship — a connection between two entities. Relationships are described by their cardinality. “The relationship between Customer and Order is one-to-many: one customer can place many orders, but each order belongs to exactly one customer.”
Cardinality — describes the numerical nature of a relationship:
- One-to-one (1:1) — each record in Table A corresponds to exactly one record in Table B.
- One-to-many (1:N) — each record in Table A can correspond to many records in Table B.
- Many-to-many (M:N) — records in both tables can correspond to multiple records in the other, requiring a junction table.
Junction table / associative table — a table used to implement a many-to-many relationship. “The OrderProduct junction table stores the relationship between orders and products, along with the quantity of each product in a given order.”
Primary key — a column or combination of columns that uniquely identifies each row in a table. “We use a UUID as the primary key for the User table to avoid exposing sequential IDs in URLs.”
Foreign key — a column in one table that references the primary key in another table, enforcing referential integrity. “The order_id column in the OrderLine table is a foreign key referencing the id column in the Order table.”
Referential integrity — the guarantee that a foreign key value always corresponds to an existing primary key value. “Dropping that record without a cascade rule would violate referential integrity.”
Normalisation Vocabulary
Normalisation is the process of structuring a database to reduce redundancy and improve data integrity.
First Normal Form (1NF) — each column contains atomic (indivisible) values, and each row is unique. “The table was not in 1NF because the tags column stored comma-separated values; we split it into a separate Tags table.”
Second Normal Form (2NF) — the table is in 1NF and every non-key attribute is fully dependent on the entire primary key (relevant for composite keys). “Moving the product name out of the OrderLine table and into the Product table brought the schema to 2NF.”
Third Normal Form (3NF) — the table is in 2NF and no non-key attribute is transitively dependent on the primary key. “Separating the country-currency mapping into its own reference table eliminates the transitive dependency.”
Denormalisation — intentionally introducing redundancy into a schema to improve read performance. Common in analytical databases and data warehouses. “We denormalised the customer region into the events table to avoid expensive joins at query time.”
Schema Design Vocabulary
Index — a database structure that improves the speed of data retrieval at the cost of additional storage and write overhead. “We added a composite index on (user_id, created_at) to support the most common query pattern on the events table.”
Composite index — an index that covers more than one column. Column order in a composite index matters. “The composite index on (tenant_id, status, created_at) serves the most frequent queries in our multi-tenant architecture.”
Constraint — a rule enforced at the database level: primary key, foreign key, unique, not null, check. “Adding a check constraint on the amount column ensures we never store a negative transaction value.”
Partitioning — dividing a large table into smaller, more manageable pieces based on a column value (range, list, or hash). “We partition the events table by month on created_at; queries for recent events only scan the most recent partition.”
Schema migration — a script that changes the structure of a database schema in a controlled, versioned way. “Every schema change goes through a migration file; we never modify the database schema manually.”
dbt Model Types
dbt (data build tool) is widely used in data engineering. Its layered modelling approach has its own vocabulary.
Staging model — a thin transformation layer that cleans, renames, and casts raw source data. One staging model per source table, minimal business logic. “The stg_orders model casts the order_total field from a string to a decimal and renames columns to match our naming convention.”
Intermediate model — a model that combines or further transforms staging models, but is not yet a business-facing output. “The intermediate model joins orders with returns to calculate net order value, which is then referenced by multiple mart models.”
Mart model — a wide, business-facing table designed for reporting and analysis. Marts are often denormalised for query performance. “The fct_orders mart model contains one row per order with all the dimensions an analyst needs for revenue reporting.”
Snapshot model — a dbt model that captures the state of a slowly changing dimension at a point in time, enabling historical analysis. “We created a snapshot of the users table to track how customer subscription tiers change over time.”
Seed — a CSV file loaded into dbt as a static reference table. “Country codes and their ISO names are managed as a dbt seed.”
Example Sentences in Context
-
“The relationship between the Campaign and ConversionEvent entities is one-to-many; a campaign can generate thousands of conversion events, but each event is attributed to exactly one campaign.”
-
“We deliberately denormalised the pricing tier into the invoices table because joining back to the pricing configuration at query time was adding 200 ms to every report query.”
-
“The composite index on (customer_id, order_date DESC) was chosen specifically to support the most common access pattern: fetching a customer’s most recent orders.”
-
“All schema changes are versioned as migration files in the repository; no DBA has direct write access to the production schema outside of a reviewed and approved migration.”
-
“Our dbt project follows a strict three-layer architecture: staging models contain only source-faithful transformations, intermediate models contain join logic, and mart models are the only layer that analysts query directly.”