Advanced Interview #ai-infrastructure #gpu-clusters #llm-serving #interview-prep

AI Infrastructure Architect Interview Questions

5 exercises — choose the best-structured answer to common AI infrastructure interview questions. Focus on GPU cluster design, distributed training sharding, checkpointing, KV cache management, and LLM serving architecture.

Structure for AI infrastructure interview answers

Give bandwidth numbers: NVLink 900 GB/s, NDR IB 400 Gbps — concrete specs show hands-on experience
Separate intra-node from inter-node: NVLink/NVSwitch within a node, InfiniBand between nodes
Quantify memory: calculate actual GB for the model size — interviewers want to see you can size a system
Cover failure modes: atomic rename for checkpoints, PERMISSIVE left on for mesh — failure awareness signals seniority

0 / 5 completed

1 / 5

The interviewer asks: "Design the network topology for a GPU cluster training a 100B parameter model — what interconnect technologies do you use and why?"
Which answer best covers GPU cluster networking?

2 / 5

The interviewer asks: "Compare PyTorch FSDP and DeepSpeed ZeRO (Stage 1, 2, 3) for training a 70B parameter model — what do they shard and what are the trade-offs?"
Which answer best covers distributed training sharding?

3 / 5

The interviewer asks: "What is your checkpointing strategy for training a 100B+ parameter model on a 1000-GPU cluster — what can go wrong and how do you mitigate it?"
Which answer best covers checkpoint engineering?

4 / 5

The interviewer asks: "Explain KV cache management in LLM inference — what is the KV cache, why does it grow, and what eviction strategies exist?"
Which answer best covers LLM inference cache engineering?

5 / 5

The interviewer asks: "Explain disaggregated prefill and decode in LLM serving — why are they separated, and how does this affect your infrastructure design?"
Which answer best covers serving infrastructure architecture?