Replicate makes it easy to run and deploy machine learning models in the cloud. These exercises cover the core API concepts every developer needs to know: predictions, version pinning, hardware selection, and fine-tuned model deployment.
0 / 5 completed
1 / 5
At standup, a colleague asks what a Prediction is in the Replicate API. What is the correct answer?
A Prediction is Replicate's term for a single model run. You create one by POSTing to /v1/predictions with a version ID and input object. Replicate queues the job, runs it asynchronously on GPU hardware, and either lets you poll the prediction status or calls your webhook URL when the output is ready. The prediction object contains the output, logs, and timing metadata.
2 / 5
During a PR review, a teammate asks why the code references a specific model version hash rather than just the model name. What is the correct explanation?
In Replicate, a model version is an immutable snapshot of a model's weights, code, and Cog configuration, identified by a SHA256 hash. Pinning to a specific version hash ensures reproducible outputs — the same inputs always produce the same results. Using only the model name would implicitly use the latest version, which could change without notice when the model author pushes an update.
3 / 5
In a design review, the team discusses hardware options on Replicate. A junior engineer asks when you would choose an A100 over an A40. What is correct?
Replicate lets you specify hardware when deploying a model. The A100 uses high-bandwidth HBM2e memory (up to 80GB), making it superior for workloads that are memory-bandwidth-bound, such as large LLM inference or training. The A40 has 48GB of GDDR6, which is lower bandwidth but cost-effective for models that fit in VRAM without needing extreme throughput. Choosing the right hardware affects both cost and latency.
4 / 5
An incident report shows missed prediction results because webhook deliveries failed. A senior engineer asks what the webhook field in a Replicate prediction controls. What is correct?
The webhook field in a Replicate prediction accepts an HTTPS URL. Replicate sends HTTP POST requests to this URL as the prediction changes state — including intermediate output updates (if the model streams output) and a final request when the prediction status reaches succeeded or failed. Using webhooks avoids polling and is the recommended pattern for production deployments.
5 / 5
During a code review, a senior engineer asks what a fine-tuning deployment on Replicate produces. What is the correct answer?
Replicate's training API (trainings.create) runs a fine-tuning job and, upon completion, saves the result as a new model version with a unique hash. You can then create a Deployment from this version (specifying min/max instances and hardware) and run predictions against it exactly like any other Replicate model. The fine-tuned weights are managed by Replicate — you do not need to handle checkpoints yourself.