English for Data Quality Discussions: Talking About Validation and Trust

When a dashboard shows a number nobody believes, or a pipeline silently drops half its rows, the conversation that follows is a data quality discussion. Data engineers, analysts, and stakeholders need a shared, precise English to describe what is wrong, how bad it is, and whether to trust the numbers. This guide gives you the vocabulary and the diplomatic phrasing.

The dimensions of data quality

Data quality isn’t one thing — it has named dimensions. Using the right one makes you precise.

Dimension	Question it answers	Example problem
Completeness	Is all the data there?	“20% of rows have a null `country`.”
Accuracy	Is the data correct?	”The totals don’t match the source.”
Consistency	Does it agree across systems?	”CRM and warehouse disagree.”
Freshness / timeliness	Is it up to date?	”The feed is six hours stale.”
Uniqueness	Are there duplicates?	”Each order appears twice.”
Validity	Does it match the expected format?	”Emails without an @.”

“The issue isn’t accuracy — the numbers are right — it’s freshness. The dashboard is showing stale data from this morning.”

Naming the exact dimension turns a vague “the data is bad” into an actionable statement.

Core vocabulary

Term	Meaning
Anomaly	A value outside the expected range
Drift	Data slowly changing distribution over time
Null / missing	Absent values
Duplicate	A repeated record
Outlier	An extreme value
Schema	The structure/shape of the data
Reconciliation	Checking two datasets match
Ground truth	The trusted reference source

“We spotted an anomaly — a 10x spike in signups — but reconciliation against the source shows it’s a duplicate issue, not real growth.”

Describing how bad it is

Stakeholders need to know severity. Be specific and proportionate.

Vague	Precise
”The data is wrong."	"About 3% of orders are missing a region — it skews the regional breakdown but not the total."
"It’s broken."	"The pipeline dropped one partition, so yesterday is incomplete."
"Numbers look off."	"Revenue is overstated by ~5% due to double-counted refunds.”

“To be clear on scope: this affects only the EU region table, roughly 3% of rows, and it doesn’t impact the global totals. Impact is limited to the regional drill-down.”

The phrases scope, impact is limited to, and doesn’t impact help stakeholders calibrate their worry.

Talking about whether to trust the data

The real question in these meetings is often “can I use this number in my report?” Answer it directly.

“I’d hold off on quoting that figure until we reconcile.”
“The headline number is solid; it’s the breakdown I don’t trust yet.”
“Treat today’s data as provisional.”
“I’d caveat that chart — it’s based on a partial load.”
“This is safe to use; the anomaly was cosmetic.”

“The total is trustworthy. I’d caveat the regional split as provisional until tomorrow’s reload confirms it.”

Provisional (temporary, subject to change) and caveat (a warning/qualification) are essential data-quality vocabulary.

Flagging bad data diplomatically

Often the bad data comes from another team’s pipeline. Raise it without blame.

Blunt	Diplomatic
”Your pipeline is broken."	"I’m seeing something odd in the upstream feed — could we check it together?"
"You gave us wrong data."	"There seems to be a mismatch between the source and what we’re receiving."
"This is your fault."	"Looks like the schema changed upstream and we didn’t catch it.”

“Heads up — we’re seeing a mismatch between the upstream feed and the warehouse. It might be a schema change on your side that slipped through. Could we trace it together?”

Upstream (earlier in the data flow) and downstream (later) are core directional vocabulary. Problems “originate upstream” and “propagate downstream.”

Before and after: a full rewrite

Before (alarming, vague, blamey):

“the dashboard is totally wrong and the numbers are crazy, someone broke the data and we can’t trust anything. probably the upstream team.”

After (precise, calm, scoped):

“Quick flag on data quality: the EU revenue figure looks anomalous — about 10x normal. Reconciliation against the source shows it’s a duplicate issue, not real. Scope is limited to the EU regional table; the global total is unaffected and safe to use. The likely cause is a schema change upstream that broke our dedup step — I’ll confirm with the source team. Until we reload, please treat the EU breakdown as provisional.”

Common mistakes

Saying “the data are/is” inconsistently. “Data” can be singular or plural; pick one and be consistent. In tech, singular (“the data is stale”) is now common and accepted.
Confusing “accuracy” and “completeness.” Missing rows = completeness; wrong values = accuracy. Different fixes.
Using “anomaly” for any problem. An anomaly is specifically an unexpected value, not a pipeline failure.
Mixing up “upstream” and “downstream.” Upstream = source/earlier; downstream = consumers/later. Reversing these confuses everyone.
Saying “duplicated data” when you mean “duplicate records.” “Duplicates” (noun) is cleaner.

Mini-glossary

Data contract — agreed schema/quality between producer and consumer
SLA on freshness — a promise on how current data will be
Backfill — reprocessing historical data
Dedup (deduplication) — removing duplicates
Quarantine — isolating bad records instead of failing
Lineage — the path data took from source to dashboard
Sanity check — a quick plausibility test

“Let’s add a sanity check to the pipeline — if row counts drop more than 20% day-over-day, quarantine the load and alert, rather than silently publishing.”

Key takeaways

Name the exact dimension: completeness, accuracy, consistency, freshness, uniqueness, validity.
Quantify scope and impact: “3% of rows, doesn’t affect totals.”
Tell stakeholders whether data is safe to use, provisional, or to be caveated.
Flag upstream issues with “mismatch” and “let’s trace it together,” not blame.

Data quality conversations are about trust. Speak precisely about what’s wrong and how much it matters, and people will trust both the data and you.

English for Data Quality Discussions: Talking About Validation and Trust

The dimensions of data quality

Core vocabulary

Describing how bad it is

Talking about whether to trust the data

Flagging bad data diplomatically

Before and after: a full rewrite

Common mistakes

Mini-glossary

Key takeaways

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules

The dimensions of data quality

Core vocabulary

Describing how bad it is

Talking about whether to trust the data

Flagging bad data diplomatically

Before and after: a full rewrite

Common mistakes

Mini-glossary

Key takeaways

Related Articles

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules