Master Gemini API patterns — model selection, multimodal content, function calling, and search grounding.
0 / 5 completed
1 / 5
In a design review, the team debates the Gemini Flash vs Pro trade-off. What is the correct characterisation?
Gemini Flash is designed for low-latency, high-throughput, cost-efficient tasks where speed matters more than peak quality. Gemini Pro delivers higher reasoning quality and is suited for complex, multi-step tasks. Both support multimodal inputs; the choice is a cost/quality/latency trade-off, not a capability split.
2 / 5
During a PR review, a teammate asks about the structure of multimodal content parts in Gemini. What is correct?
In the Gemini API, a Part in a Content object can be text, inlineData (base64-encoded bytes with a mimeType), or fileData (referencing a URI from the Files API). Multiple parts are assembled into a Content with a role of user or model, enabling rich multimodal conversations.
3 / 5
At standup, a developer explains function calling in Gemini. Which description is accurate?
Function calling in Gemini follows a three-step pattern: (1) declare Tool objects with FunctionDeclaration schemas; (2) the model returns a FunctionCall part when it wants to invoke a function; (3) your application executes the function and returns a FunctionResponse part with the result. The model then generates a final answer incorporating the response.
4 / 5
An incident involves grounding with Google Search returning information that seems outdated. Which statement correctly explains how grounding works?
Grounding with Google Search augments the model's knowledge by retrieving live web results and injecting them as context. The model then synthesises an answer — it does not simply return a search result. You can inspect grounding_metadata.search_entry_point and grounding_chunks to see which sources were used. Results reflect live Search, not a 24-hour cache.
5 / 5
In a code review, a teammate suggests calling countTokens before sending a large request. What does this API actually do?
countTokens sends the full request structure — contents, system instruction, and tool declarations — to the API and returns total_tokens. This accurately reflects what will be consumed when you call generateContent, letting you gate requests that would exceed the model's context limit before incurring latency or errors.