English for AI Engineers: Key Vocabulary

Essential English vocabulary for AI and ML engineers — embeddings, inference, fine-tuning, RAG, agents — with clear definitions and example sentences.

Artificial intelligence is one of the fastest-moving fields in technology — and its vocabulary is evolving just as quickly. For non-native English speakers working in AI and ML, knowing the right words is not just about understanding papers and documentation. It is about communicating credibly in design meetings, code reviews, and product discussions.

This guide covers the core vocabulary you need, with definitions, example sentences, and tips on how each term is used in context.


Foundations: How Models Work

Embedding

An embedding is a numerical representation of data (text, images, audio) in a high-dimensional vector space. Items with similar meaning are positioned close together in that space.

“We store document embeddings in a vector database so we can retrieve semantically similar chunks at query time.”

“The model maps each token to a 1,536-dimensional embedding before processing.”

Inference

Inference is the process of using a trained model to make predictions on new data. It is the opposite of training.

“Inference latency is a bottleneck — we’re seeing P99 response times above two seconds.”

“We run inference on GPU instances to meet our throughput requirements.”

Token

A token is the basic unit of text that a language model processes — roughly a word or word fragment.

“This prompt uses around 800 tokens, which keeps us well within the model’s context window.”


Training and Adaptation

Fine-tuning

Fine-tuning is the process of taking a pre-trained model and continuing to train it on a smaller, domain-specific dataset to adapt its behaviour.

“We fine-tuned the base model on our support ticket data to improve classification accuracy.”

“Fine-tuning requires significantly less compute than training from scratch.”

Pre-training

Pre-training is the initial phase where a model learns from a large general dataset before any task-specific adaptation.

“The model was pre-trained on a trillion tokens of internet text.”

RLHF (Reinforcement Learning from Human Feedback)

RLHF is a training technique where human raters score model outputs, and those scores guide further training to align the model with human preferences.

“RLHF is how most commercial chat models are aligned to follow instructions helpfully.”


Retrieval and Context

RAG (Retrieval-Augmented Generation)

RAG is an architecture where a language model’s responses are augmented by retrieving relevant documents from an external knowledge base at query time.

“We use a RAG pipeline so the model always answers based on up-to-date company documentation, not just its training data.”

“The retrieval step returns the top five document chunks, which we inject into the prompt context.”

Context Window

The context window is the maximum amount of text (measured in tokens) a model can process in a single call.

“The 128k context window lets us pass the entire codebase into the prompt for refactoring tasks.”

Chunking

Chunking is the process of splitting large documents into smaller pieces before embedding them for retrieval.

“We chunk documents at 512 tokens with a 50-token overlap to preserve context across boundaries.”


Agents and Orchestration

Agent

In AI, an agent is a system that uses a language model to reason, plan, and take actions (such as calling tools or APIs) to accomplish a goal over multiple steps.

“The agent autonomously calls the search API, reads the results, and synthesises an answer.”

“We built a coding agent that can read files, write patches, and run tests in a sandboxed environment.”

Tool Use / Function Calling

Tool use (also called function calling) is a capability that allows a language model to invoke external functions or APIs as part of generating a response.

“We expose a weather API as a tool, and the model decides when to call it based on the user’s question.”

Orchestration

Orchestration refers to coordinating multiple AI components, agents, or model calls in a defined workflow.

“We use LangGraph for orchestration — it manages the state machine that routes between the planner and executor agents.”


Evaluation and Quality

Hallucination

Hallucination is when a model generates plausible-sounding but factually incorrect information.

“Our evaluation pipeline flags hallucinations by cross-referencing model outputs against the source documents.”

Benchmark

A benchmark is a standardised test used to measure model performance on a specific task or capability.

“The model scores 87% on the MMLU benchmark, which covers multi-domain knowledge.”

Latency vs. Throughput

Latency is the time to complete a single request. Throughput is how many requests the system can handle per unit of time.

“We optimised for latency in the customer-facing product, but we batch background jobs to maximise throughput.”


Putting It Together: Useful Phrases

  • “The inference pipeline processes requests with a median latency of 340ms.”
  • “We’re evaluating whether fine-tuning or RAG gives better accuracy for this use case.”
  • “The agent uses tool calls to look up real-time data before generating the response.”
  • “Chunking strategy has a significant effect on retrieval quality — we tested three approaches.”

Fluency with this vocabulary will help you participate confidently in AI engineering discussions, write clear technical specifications, and communicate progress to stakeholders. The field evolves rapidly, but these core terms form a stable foundation.