Intermediate–Advanced 7 terms

Database Schema Design

ER modelling, normalisation forms (1NF–3NF), star and snowflake schemas, indexing vocabulary, constraint types, and schema design discussion language.

  • Normalisation /ˌnɔːməlaɪˈzeɪʃən/

    The process of organising a relational database schema to reduce data redundancy and improve data integrity by applying normal forms. Each normal form eliminates a specific type of anomaly.

    "The original table stored customer address in every order row — 10,000 orders for one customer means 10,000 copies of the address. Normalising to 3NF moved the address to a customers table referenced by customer_id. Now updating a customer’s address requires one row change instead of 10,000."
  • 1NF / 2NF / 3NF /fɜːst sɛkənd θɜːd nɔːməl fɔːm/

    The first three normal forms. 1NF: all values are atomic (no arrays or nested records). 2NF: 1NF + no partial dependency (every non-key column depends on the whole primary key). 3NF: 2NF + no transitive dependency (no non-key column depends on another non-key column).

    "The orders table violated 2NF: product_name depended only on product_id, not the full composite key (order_id, product_id). Normalising to 2NF moved product_name to a products table. It also violated 3NF: category_description depended on category_id which depended on product_id. Normalising to 3NF extracted categories."
  • Star Schema /stɑː ˈskiːmə/

    A data warehouse schema pattern with a central fact table (quantitative measurements) surrounded by dimension tables (descriptive attributes). Optimised for analytical queries: denormalised dimension tables allow fast joins without complex multi-level lookups.

    "Our sales data warehouse uses a star schema: fact_sales (sale_id, amount, date_key, product_key, store_key) joins to dim_date, dim_product, dim_store. BI tool queries are fast because each dimension join is direct. Analysts don’t need to know normalisation levels — the star schema is designed for readability and query speed."
  • Covering Index /ˈkʌvərɪŋ ˈɮndɛks/

    An index that includes all columns required by a query — the query engine can satisfy the request entirely from the index without accessing the main table (a table lookup). Significantly reduces I/O for high-frequency read queries.

    "The query SELECT user_id, email FROM users WHERE status = ‘active’ was doing a table lookup after the index scan. We created a covering index: CREATE INDEX idx_status_user_email ON users(status, user_id, email). Now the query is satisfied entirely from the index — no table access, 10x faster."
  • Foreign Key Constraint /ˈfɒrɪn kɪ kənˈstrɛɪnt/

    A constraint that enforces referential integrity between two tables: a column’s value in one table must match an existing value in the referenced table’s primary key. Prevents orphaned records.

    "Adding FOREIGN KEY (customer_id) REFERENCES customers(id) ON DELETE CASCADE ensures every order has a valid customer. When a customer is deleted, their orders are automatically deleted. Without the constraint, deleting a customer would leave orphaned orders — queries joining on customer_id would silently drop rows."
  • Slowly Changing Dimension (SCD) /ˈsləʊlɪ tʃeɪndʒɪŋ dɪˈmɛnʃən/

    A dimension table where attributes change slowly over time — not every day, but occasionally. SCD Type 1: overwrite old value (lose history). SCD Type 2: add new row with version dates (preserve full history). SCD Type 3: add a previous-value column.

    "Customer addresses are an SCD Type 2 dimension: when an address changes, we add a new row with a new effective_date and mark the old row with expired_date. The sales fact table joins on the address row that was current at the sale date. This lets us answer: ‘What was the customer’s address when they made this purchase in 2021?’"
  • Denormalisation /diˌ nɔːməlaɪˈzeɪʃən/

    Intentionally introducing redundancy into a schema for performance — duplicating data to avoid joins in critical read paths. Accepted trade-off in read-heavy systems where join cost exceeds the cost of managing duplicated data.

    "The product detail page query joined 8 tables and took 200ms. We denormalised: the product listing table now includes category_name, brand_name, and main_image_url as columns (duplicated from their normalised tables). Writes need to update two places, but the read query is now a single table scan — 4ms."