5 exercises on adjective–noun collocations from data engineering, databases, and storage architecture — the precise terms used by data engineers, DBAs, and backend engineers.
Key data collocations in this set
raw data — unprocessed, as-ingested; not "original" or "source"
immutable record — cannot be changed after creation; used in event sourcing
volatile memory — lost on restart (RAM); opposite: persistent/non-volatile
normalised schema — reduces redundancy via table relationships (1NF–3NF)
persistent storage — survives restarts; contrast with ephemeral storage
0 / 5 completed
1 / 5
A data engineer explains an ETL pipeline:
"In stage one, we ingest ___ data directly from the API — no transformation, no cleaning, exactly as it arrives."
Which adjective describes data that has not been processed, cleaned, or transformed?
Raw data is the established term for data in its original, unprocessed form — as it was collected from a sensor, API, log file, or user action. It has not been validated, cleaned, aggregated, or transformed. Raw data is the starting point of any ETL (Extract, Transform, Load) or ELT pipeline.
Why the others are not canonical:
original data — descriptively accurate but not the fixed technical term used in data engineering
unprocessed data — also accurate but not the industry collocation
source data — used in ETL context to mean "data from the source system", but differs from "raw" (source data may already be structured)
Common collocations:
ingest raw data
raw data pipeline
raw data lake
raw logs
process raw data into structured form
2 / 5
A database designer explains a schema choice:
"All records in the events table are ___ — once written, they can never be modified or deleted, only superseded by new records."
Which adjective describes data that cannot be changed after it is created?
Immutable record is the precise technical term. Immutability means an object or record cannot be changed after it is created. In data engineering and distributed systems, immutable data is a design choice that simplifies reasoning about state, enables append-only storage, and is fundamental to patterns like event sourcing and functional programming.
Why the others fail:
permanent record — means it persists indefinitely, not that it cannot be modified
fixed record — ambiguous; "fixed" usually means "repaired" in engineering contexts
frozen record — informal; used in Python for frozen dataclasses but not a general data term
Common collocations:
immutable data
immutable infrastructure (servers are never modified in-place; replaced)
immutable object (programming)
append-only log (same concept in storage)
write-once, read-many (WORM)
3 / 5
A storage architect specifies requirements:
"User session tokens must be stored in ___ memory — when the server restarts, all active sessions are lost by design."
Which adjective describes memory or storage that loses its contents when power is removed?
Volatile memory is the correct technical term. Volatile memory loses its contents when the power is removed or the system restarts. RAM is the classic example. In contrast, non-volatile storage (SSDs, HDDs, flash) retains data after power loss.
Why the others are not the established term:
temporary memory — not a technical classification; "temporary" describes duration, not the power-loss property
transient memory — used in some networking and messaging contexts (transient message = not persisted) but not the standard memory term
ephemeral memory — "ephemeral" is used in cloud contexts (ephemeral storage, ephemeral container) but is not the classical hardware term for this property
Common collocations:volatile RAM, volatile cache, write to volatile memory.
4 / 5
A database migration proposal explains a design strategy:
"We will move to a ___ schema — the data will be split into separate related tables to eliminate redundancy and ensure data integrity."
Which adjective describes a database schema designed to reduce data duplication through structured table relationships?
Normalised schema is the canonical database design term. Normalisation (British spelling: normalised) is the process of organising a relational database to reduce data redundancy and improve data integrity. It involves decomposing tables into smaller, related tables, following normal forms (1NF, 2NF, 3NF, BCNF).
Why the others fail:
organised schema — generic; not a technical term in database design
structured schema — "structured" refers to data format (structured vs. unstructured data), not normalisation
optimised schema — could mean many things (query performance, indexing); not the normalisation-specific term
The canonical pair:
normalised schema — reduces redundancy; good for writes; used in OLTP systems
denormalised schema — adds redundancy deliberately for read performance; used in data warehouses, OLAP, and analytics
5 / 5
A cloud storage specification states:
"The audit log database uses ___ storage — all writes are committed to disk and survive server crashes, power failures, and restarts."
Which adjective describes storage that retains data beyond a single session or runtime?
Persistent storage is the canonical term for storage that retains data after a process ends, a container stops, or a server restarts. It contrasts with volatile memory and ephemeral storage. In Kubernetes, a PersistentVolume (PV) provides exactly this property.
The nuances:
persistent storage — data survives restarts and process termination ✅ — the standard term in cloud, containers, and database design
durable storage — also correct and widely used; "durable" specifically means data survives hardware failures (AWS S3 promises 11 nines of durability). In practice, "durable" and "persistent" are often used interchangeably, but "persistent" is more common for storage type.
reliable storage — generic; implies availability and correctness, not specifically the survive-restart property
stable storage — a theoretical concept from distributed systems (fault-tolerant storage); not an everyday operational term
Common collocations:persistent volume (Kubernetes PV), persistent storage layer, attach persistent storage, ephemeral vs persistent.