English for Growth Engineering: Experimentation, Funnel Metrics, and A/B Testing Vocabulary

A/B test, holdout group, statistical significance, north star metric, AARRR — English vocabulary for growth engineers in experiment reviews and funnel analysis meetings.

Growth engineering sits at the intersection of product, data, and engineering — and it has a rich vocabulary of its own. In experiment review meetings, funnel analysis sessions, and sprint retrospectives, growth engineers use precise statistical and product language that can be unfamiliar even to experienced backend developers.

This post teaches the vocabulary you need to participate confidently in growth discussions in English, with the phrases and collocations growth teams actually use.


A/B Testing Fundamentals

An A/B test (also called a split test) is an experiment in which users are randomly divided into groups that receive different versions of a feature, UI element, or flow. The goal is to measure which version produces a better outcome.

“We ran an A/B test on the checkout button colour. The variant outperformed the control by 4.2% on conversion rate.”

A multivariate test extends the A/B model to test multiple variables simultaneously — for example, button colour and button text at the same time.

“A multivariate test lets us evaluate the interaction effects between the headline and the CTA copy. The downside is we need a much larger sample size.”

The control (or control group) is the group of users who see the existing, unchanged experience. The treatment group (also called the variant group) sees the new version.

“The control group sees the current onboarding flow. Treatment A sees the shortened version. Treatment B sees the video walkthrough.”

A holdout group is a set of users who are deliberately excluded from a test — and sometimes from multiple experiments — to serve as a long-term baseline. They are used to measure the cumulative effect of all changes over time.

“We maintain a 5% holdout group across all experiments. Comparing the holdout to the full population each quarter tells us whether our experimentation programme is actually moving the needle.”


Statistics in Plain English

Statistical significance means the result of an experiment is unlikely to be due to random chance. The threshold is usually expressed as a p-value.

“The result is statistically significant at the 95% confidence level — in other words, there is less than a 5% chance this result is noise.”

Be careful with this term. A common mistake (even among native speakers) is saying a result is significant when it is statistically significant. Statistical significance does not mean the effect is large or meaningful in practice.

A p-value is the probability that the observed result (or a more extreme one) would occur if there were no real difference between the groups — the null hypothesis were true. A low p-value (typically below 0.05) is evidence against the null hypothesis.

“The p-value is 0.03, so we can reject the null hypothesis. But the effect size is small — we still need to decide if it’s worth shipping.”

A confidence interval is a range of values that, with a specified probability (usually 95%), contains the true effect size.

“The 95% confidence interval for the uplift is between +1.8% and +6.4%. We’re confident the effect is positive, but the magnitude is uncertain.”


Funnel Analysis Vocabulary

Funnel analysis is the process of mapping and measuring a sequence of steps users take toward a goal (such as signing up or making a purchase) and identifying where users drop off.

“The funnel analysis shows that 60% of users who start the registration flow drop off at the email verification step. That’s our biggest leak.”

A conversion rate is the percentage of users who complete a desired action — from clicking a button to purchasing a subscription.

“Our free-to-paid conversion rate is 3.2%. The industry benchmark for this category is around 5%, so there’s room to improve.”

An activation metric is the specific event or milestone that indicates a new user has experienced the core value of a product for the first time.

“Our activation metric is ‘user creates their first project within 48 hours of signup.’ Until they hit that, they’re at very high churn risk.”


North Star and AARRR

The north star metric (NSM) is the single metric that best captures the core value a product delivers to its users. It is the number the entire growth team organises around.

“Our north star metric is weekly active projects. Revenue is a lagging indicator — WAP tells us whether users are actually getting value.”

The AARRR framework (also called pirate metrics, because “AARRR” sounds like a pirate) is a funnel model covering five stages: Acquisition (users discover the product), Activation (users have a good first experience), Retention (users return), Revenue (users pay), and Referral (users recommend the product).

“We mapped our funnel against AARRR and found that our retention stage is the weakest. We’re acquiring users well but losing them in the first week.”

Pronounce AARRR as the letters: “A-A-R-R-R” or just say “pirate metrics” — both are used.


Phrases for Experiment Reviews and Meetings

Use these in experiment readout meetings, sprint reviews, and data discussions:

  • “The experiment reached statistical significance on day 12. We’re recommending a ship decision.”
  • “The p-value is borderline — 0.07. I’d suggest running it for another week before we call it.”
  • “Treatment B moved conversion rate by 2.1 percentage points, but the confidence interval overlaps zero, so we can’t conclude it’s a real effect.”
  • “We’ve identified the activation event — users who connect their first data source within 24 hours are 3x more likely to be retained at 30 days.”
  • “This is our north star metric. Every experiment we run should have a hypothesis about how it moves WAP, even if we’re primarily measuring conversion.”

Key Collocations

CollocationExample
run an A/B test”We’re running an A/B test on the pricing page.”
reach statistical significance”The test reached significance after 14 days.”
move the north star metric”Will this experiment move the north star metric?“
analyse the funnel”Let’s analyse the funnel to find the biggest drop-off point.”
ship the variant”We’re shipping the variant to 100% of users next Tuesday.”
measure conversion rate”We measure conversion rate at each step of the onboarding flow.”

Practice

Pick a product you use regularly (a news app, a to-do list tool, a payment app). Identify what you think its north star metric is. Then map its user flow against the AARRR framework — write one sentence for each stage. Finally, write a hypothesis for an A/B test you would run to improve the activation stage: “If we [change X], we expect [metric Y] to increase because [reason Z].” This is the standard hypothesis format used in growth teams — practise saying it until it feels fluent.