Principal 6 topic areas 30+ exercises

AI Infrastructure Architect

AI Infrastructure Architects design the complete infrastructure stack that trains, fine-tunes, and serves large AI models in production. They specify GPU cluster network topology (InfiniBand vs RoCE), configure distributed training frameworks such as FSDP and DeepSpeed, design checkpoint strategies for multi-week training runs, manage KV cache allocation for inference serving, and optimise inference throughput using techniques such as continuous batching and speculative decoding. English is the working language for architecture proposals, compute budget justifications, and collaboration with research teams distributed across global locations.

Start first exercise → Browse all exercises

Topics covered

GPU Cluster Design
Distributed Training Frameworks
Checkpoint Strategy
KV Cache Management
Inference Infrastructure
Training Cost Optimisation

Vocabulary spotlight

4 terms every AI Infrastructure Architect should know in English:

KV cache n.

The key-value cache stored per-token during transformer inference that avoids recomputing attention for previously processed tokens, enabling efficient autoregressive generation

"Implementing paged KV cache management increased the maximum concurrent request capacity by 3× by eliminating memory fragmentation from variable-length sequence allocations."

continuous batching n.

An inference serving technique that dynamically inserts new requests into a running batch as earlier requests complete, maximising GPU utilisation compared to static batching

"Continuous batching increased GPU utilisation from 45% to 87% on the inference cluster, reducing the average request latency at p95 from 4.2 seconds to 1.8 seconds."

pipeline parallelism n.

A distributed training strategy that partitions model layers across multiple GPUs and processes micro-batches in a pipelined fashion to reduce inter-GPU communication overhead

"Combining pipeline parallelism across 8 nodes with tensor parallelism within each node allowed the 175B parameter model to fit in the available GPU memory with acceptable bubble overhead."

compute budget n.

The total allocation of GPU hours or floating-point operations assigned to a training or fine-tuning run, used to apply scaling laws and make decisions about model size and data volume

"Applying Chinchilla scaling laws to the available compute budget suggested training a 13B parameter model on 260B tokens rather than a larger model on fewer tokens."

Open full glossary →

📚 Vocabulary Reference

Key terms organised by category for AI Infrastructure Architects:

Training Infrastructure

GPU clusterInfiniBandRoCENVLinkFSDPDeepSpeedZeROtensor parallelismpipeline parallelismgradient checkpointing

Inference Infrastructure

KV cachecontinuous batchingpaged attentionspeculative decodingquantisationvLLMTensorRT-LLMthroughputlatency SLArequest batching

Economics

compute budgetGPU utilisationspot instancepreemptionscaling lawsChinchillaFLOPsmixed precisionbf16training run cost

Study full vocabulary modules →

Recommended exercises

Machine Learning Infrastructure Vocabulary 25 exercises

Vocabulary

ML Engineering Interview Questions 5 exercises

Interview

Real-world scenarios you'll practise

Writing a GPU cluster architecture proposal in English for a foundation model training run, justifying InfiniBand topology and storage tiering decisions to a technical leadership committee
Presenting a training cost optimisation strategy to a VP of Engineering, quantifying savings from mixed-precision training, spot instance usage, and gradient checkpointing
Collaborating with an ML research team to design a checkpoint strategy that recovers a 512-GPU training run within 30 minutes of a hardware failure
Documenting KV cache configuration guidelines in English so model serving teams can tune inference infrastructure without requiring principal-level support

Frequently Asked Questions

What English skills do AI Infrastructure Architects most need to improve?+

AI Infrastructure Architects most commonly need to improve: technical vocabulary (the correct English terms for domain concepts), collocation accuracy (using the right verb for each action), written communication (bug reports, PR descriptions, technical docs), and spoken communication for standups, code reviews, and stakeholder meetings.

How long does the AI Infrastructure Architect learning path take?+

The AI Infrastructure Architect learning path contains 20–40 hours of material studied comprehensively. Most learners focus on the highest-priority modules first and return to the rest over time. Spending 30 minutes per day for 4–6 weeks produces noticeable improvement in workplace English.

What vocabulary should a AI Infrastructure Architect prioritise first?+

Start with the vocabulary that appears most in your daily work — terms you read in documentation, use in commit messages, and hear in meetings. The AI Infrastructure Architect path begins with the most frequent vocabulary clusters before moving to advanced communication patterns.

Are there interview exercises for AI Infrastructure Architect roles?+

Yes. The AI Infrastructure Architect path includes role-specific interview question modules with model answers and key phrases — the actual questions interviewers ask and the vocabulary needed to answer them fluently. There is also a dedicated Interview Practice hub for general interview skills.

Does this path include pronunciation help?+

Yes. The path links to pronunciation exercises for the technical terms most commonly mispronounced in this domain. The Pronunciation hub includes drills for acronyms, silent letters, word stress, and minimal pairs — all in IT context.

What are the most common English mistakes AI Infrastructure Architects make?+

The most common mistakes: incorrect collocations (using the wrong verb with a technical noun), false friends from L1, tense errors when narrating past incidents or walkthroughs, and using overly formal or overly casual register in written communication.

How do I improve my English for code reviews?+

Learn the standard code review collocations: approve a PR, request changes, leave a nit, address feedback, block a merge, resolve a conversation. Use hedging language for suggestions: "This might be cleaner as…", "Have you considered…?". The Collocations section includes a dedicated Code Review set.

Can I use this path alongside my daily work?+

Yes — the path is designed for working professionals. Each exercise set takes 10–15 minutes. The most effective approach is to study a vocabulary module before a meeting or task where you'll use that vocabulary, then practise immediately after. Context-linked practice produces much faster retention.

Is the content free?+

Yes, completely free. No registration required, no payment, no time limit. All vocabulary modules, exercises, glossary entries, and learning path guides are open access.

How do I track my progress through this path?+

Progress is tracked in your browser's local storage — completed exercise sets are marked with a checkmark when you return. No account is needed. You can bookmark specific modules and use the exercises overview to see which sets you've completed.