Vertex AI Model Garden is Google Cloud's catalog of foundation models, open models, and fine-tunable models. Master the vocabulary for endpoint types, open vs proprietary models, API patterns, and deployment configuration including accelerator selection and scale-to-zero.
0 / 5 completed
1 / 5
An engineer deploys a Llama 3 70B model from Vertex AI Model Garden to a dedicated endpoint. Which resource type does Vertex AI provision?
Vertex AI Model Garden deployments use Vertex AI Prediction endpoints, which are managed serving infrastructure backed by GPU or TPU instances. You specify the machine type (e.g., g2-standard-48) and accelerator count; Vertex handles orchestration, scaling, and health checks.
2 / 5
What distinguishes a 'tuned model' deployment in Model Garden from a 'foundation model' one-click deployment?
When you deploy a tuned model, the endpoint references a fine-tuned artifact in the Vertex AI Model Registry (created via supervised fine-tuning or RLHF pipelines). Foundation model deployments use the original weights from Model Garden directly, without a registry artifact intermediary.
3 / 5
A team uses the Vertex AI Model Garden to access Gemini 1.5 Pro. Which API endpoint format should their client use?
Vertex AI uses the regional aiplatform endpoint with the pattern {REGION}-aiplatform.googleapis.com. The resource path includes the project, location, publisher (google), model name, and action (generateContent). This differs from the direct Gemini API which uses generativelanguage.googleapis.com.
4 / 5
Model Garden shows a model card with the label 'Open model'. What does this specifically mean in Vertex AI terminology?
Open models in Model Garden have weights that can be downloaded and are typically licensed under permissive or community licenses (e.g., Meta's Llama license, Apache 2.0). This contrasts with proprietary models like Gemini whose weights are not distributable. Open models can be fine-tuned and self-hosted.
5 / 5
An engineer sets min-replica-count=0 on a Model Garden endpoint to save cost. What is the trade-off?
Setting min-replica-count=0 enables scale-to-zero, eliminating idle costs. However, when a request arrives after the endpoint has scaled down, Vertex AI must provision GPU instances and load model weights — a cold start that can take several minutes for large models. This is unsuitable for latency-sensitive applications.