The OpenAI Realtime API enables low-latency voice and text conversations using persistent WebSocket connections. Understanding sessions, VAD, audio delta events, and the function calling loop is essential for building real-time voice AI applications.
0 / 5 completed
1 / 5
What transport protocol does the OpenAI Realtime API use for bidirectional audio and text communication?
The OpenAI Realtime API uses WebSockets for persistent, low-latency bidirectional communication. The client sends audio chunks and text messages; the server streams back audio deltas, transcripts, and function call events. WebSockets are essential because the API must handle continuous audio streams and respond in real time without HTTP request/response overhead.
2 / 5
What is VAD (Voice Activity Detection) in the OpenAI Realtime API and what does it control?
The Realtime API's server-side VAD monitors the incoming audio stream and automatically detects speech boundaries — it triggers when speech starts and when it ends. When VAD detects the user has stopped speaking, it automatically queues a response. Developers can also disable VAD and manually control turn boundaries using input_audio_buffer.commit events.
3 / 5
A developer receives a response.audio.delta event from the Realtime API. What does this event contain?
response.audio.delta events contain base64-encoded PCM audio chunks (incremental pieces of the AI's audio response). The client must collect these delta events, decode them, and play them sequentially. This streaming approach allows audio playback to begin before the full response is generated, reducing perceived latency.
4 / 5
What is the purpose of a session in the OpenAI Realtime API?
A Realtime API session maintains all state for a conversation: the audio input/output buffers, conversation item history (turns), tool definitions, and session configuration (voice, instructions, VAD settings). The session persists for the lifetime of the WebSocket connection. Developers send session.update events to modify configuration mid-conversation.
5 / 5
An engineer integrates the Realtime API with function calling. When the model decides to call a function, what event does the client receive?
When the model generates a function call, the Realtime API emits response.output_item.done with type: 'function_call', containing name, call_id, and arguments (JSON string). The client executes the function, then sends a conversation.item.create event with type function_call_output and the call_id to return results to the model.