Modal is a serverless cloud platform built for GPU workloads and ML inference. These exercises test your understanding of Modal's core abstractions: functions, volumes, hardware selection, concurrency, and deployment.
0 / 5 completed
1 / 5
At standup, a colleague asks what the @app.function decorator does in Modal. What is the correct answer?
@app.function is Modal's primary decorator for defining remotely executable functions. When called, Modal serialises the arguments, schedules the function on the specified cloud infrastructure (CPU or GPU), runs it in the defined container image, and returns the result. You can specify GPU type, memory, timeout, concurrency, and secrets directly in the decorator, making it the core building block of Modal applications.
2 / 5
During a PR review, a teammate asks what Modal Volumes are used for. Which answer is correct?
Modal Volumes are persistent, network-attached filesystems that you mount into function containers using the volumes parameter. Data written to a volume persists across invocations, unlike the ephemeral container filesystem. Volumes can be shared between multiple concurrent function instances, making them useful for caching model weights, storing datasets, or maintaining state across runs.
3 / 5
In a design review, the team debates GPU selection. A senior engineer asks when to choose H100 over A100 on Modal. What is correct?
The H100 (Hopper architecture) delivers roughly 2–3× the FP16/BF16 throughput and higher memory bandwidth (HBM3) compared to the A100, making it the best choice for latency-critical LLM inference or large training runs. In Modal you specify GPU type via gpu="H100" or gpu=modal.gpu.H100(). The tradeoff is higher cost per hour, so A100s or A10Gs remain cost-effective for workloads that are not throughput-bound.
4 / 5
An incident report shows a Modal web endpoint returning 529 errors under load. A senior engineer asks what concurrency_limit and allow_concurrent_inputs control. What is the correct answer?
concurrency_limit caps the total number of container instances Modal will provision for a function — important for cost control. allow_concurrent_inputs enables a single container to process multiple requests in parallel (useful for async Python code or batched inference), reducing cold-start overhead. A 529 error means Modal hit the concurrency cap; increasing concurrency_limit or enabling allow_concurrent_inputs can relieve the pressure.
5 / 5
During a code review, a senior engineer asks what modal deploy does differently from modal run. What is accurate?
modal deploy creates a persistent deployment: web endpoints stay live, scheduled functions keep running, and the app is accessible after your terminal closes. modal run is for ephemeral execution — it runs the specified function or entrypoint, streams logs to your terminal, and tears everything down on exit. Use modal deploy for production services and modal run for development and one-off jobs.