English for Data Analysts: Vocabulary and Communication Patterns

Essential vocabulary for data analysts — KPIs, cohort analysis, A/B tests, statistical significance, data storytelling, and how to walk stakeholders through a dashboard.

Data analysts spend as much time communicating findings as they do querying databases. Whether you are presenting to a product manager, walking a stakeholder through a dashboard, or writing up A/B test results, you need precise language that builds trust and drives decisions. This guide covers the vocabulary you will use every day as a data analyst working in an English-speaking environment.


Roles: Who Does What?

Data Analyst vs Data Scientist vs Data Engineer

These three roles are often confused — and the confusion causes frustration in job interviews and team meetings.

  • A data analyst turns existing data into insights and reports. They write SQL, build dashboards, and answer business questions.
  • A data scientist builds predictive models and runs experiments using statistics and machine learning.
  • A data engineer builds and maintains the pipelines that collect, transform, and store data so analysts and scientists can use it.

“The data engineer built the pipeline. The data analyst built the report. The data scientist built the model that scores the users.”


Metrics and Dimensions

KPI (Key Performance Indicator)

A KPI is a metric that directly measures progress towards a business goal. Teams choose KPIs carefully — too many, and focus is lost.

“Our north star KPI is weekly active users. Revenue is important, but engagement drives everything else.”

Metric

A metric is any quantitative measurement — revenue, page views, conversion rate, churn. KPIs are a subset of metrics: the ones that matter most.

Dimension

A dimension is a categorical attribute used to segment or filter metrics. Country, device type, and user plan are common dimensions.

“Break the conversion metric down by the device dimension — I suspect mobile is dragging the average down.”

Aggregation

Aggregation means combining multiple values into a single summary value — SUM, COUNT, AVG, MIN, MAX. It is the foundation of almost every query and report.

“This aggregation counts unique users per day — make sure it’s COUNT DISTINCT user_id, not just COUNT.”


Analysis Techniques

Cohort Analysis

Cohort analysis groups users by a shared characteristic — usually when they signed up — and tracks their behaviour over time. It is essential for understanding retention.

“The January cohort has much better 30-day retention than the March cohort — something changed between those releases.”

Funnel Analysis

Funnel analysis tracks how users move through a sequence of steps (the funnel) — from landing page to sign-up to first purchase, for example. It reveals where people drop off.

“There’s a 60% drop-off between step two and step three of the funnel — that’s our biggest problem right now.”

Pivot Table

A pivot table (or pivot) rearranges data by rotating rows into columns (or vice versa) to reveal patterns. In SQL this is done with CASE WHEN or PIVOT syntax.

“I built a pivot table showing revenue per product per region — it made the regional differences obvious at a glance.”


A/B Testing

A/B Test

An A/B test (also called a split test) randomly assigns users to two (or more) variants — the control (A) and the treatment (B) — and measures whether the treatment performs differently.

“We ran an A/B test on the checkout button colour. The green variant outperformed the grey control by 8%.”

Statistical Significance

Statistical significance tells you whether the difference between variants is likely to be real or just random noise. A result is typically considered statistically significant when p < 0.05.

“The uplift looks good, but it’s not statistically significant yet — we need to run the test for another week to collect enough data.”

p-value (in plain English)

The p-value is the probability of seeing a result at least as extreme as the one observed if there were actually no difference between variants. A low p-value (below your threshold, typically 0.05) means the result is unlikely to be random.

“A p-value of 0.03 means there’s a 3% chance we’d see this difference by chance alone — so we’re fairly confident the effect is real.”

Confidence Interval

A confidence interval gives a range of values within which the true effect probably falls. “The conversion rate increased by 4% (95% CI: 1.5% to 6.5%)” means the true uplift is likely between those bounds.

“The confidence interval is wide — the effect could be anywhere from +1% to +9%. We need more data to be precise.”


Data Storytelling

Data Storytelling

Data storytelling is the practice of combining data, visualisations, and narrative to communicate insights in a compelling way. Numbers alone rarely change decisions — context and story do.

“Don’t just show the chart — tell the story. Why did churn spike in March? What happened? What should we do?”

Insight vs Observation

An observation is a raw fact: “Conversion dropped by 10% last week.” An insight explains why and what it means: “Conversion dropped because the payment provider had an outage on Thursday and Friday — we lost about 300 transactions.”

“We need insights, not just observations. Anyone can read the numbers — we need you to explain what they mean.”

Data Storytelling Phrases

Use these when presenting findings:

PhraseWhen to use it
”The data shows a clear trend…”Introducing a consistent pattern
”There’s an anomaly here worth investigating…”Flagging a data point that stands out
”If we segment by…, we see that…”Breaking down a metric by dimension
”The root cause appears to be…”Giving a causal explanation
”Based on this, I’d recommend…”Moving from insight to action

Dashboard Walk-Throughs

When presenting a dashboard to stakeholders, use this structure:

  1. Orient the audience — “This dashboard shows our weekly acquisition funnel for the last 90 days.”
  2. Highlight the headline metric — “The key number here is our week-on-week conversion rate — currently at 3.2%.”
  3. Point out the important trend — “You can see a dip in week 8 — that coincides with the pricing change.”
  4. Explain any filters or segments — “I’ve filtered to paid traffic only. Organic is on the second tab.”
  5. Invite questions — “Happy to drill down into any of these numbers.”

“Let me walk you through this dashboard. The top row shows acquisition — clicks, sign-ups, and activations. The middle row shows retention by cohort. Any questions before I go deeper?”


Common Data Analyst Phrases

PhraseMeaning
”The numbers don’t add up”There’s a discrepancy — check the data source or logic
”We need to slice this differently”Segment the data by a different dimension
”The signal is noisy”There’s too much random variation to draw conclusions
”Let’s sanity-check the numbers”Verify the query or calculation is correct
”We’re underpowered”The sample size is too small to detect a significant effect
”Correlation, not causation”Two metrics move together but one doesn’t cause the other