OpenAI's Build Hour session introduces three new realtime audio models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 ships with 128K context, parallel tool calling, preambles, controllable expressiveness, and multi-turn context retention. These are not incremental upgrades. They change what low-latency voice agents can actually do in production.
The session runs two live demos: a voice-powered shopping search agent and a product analytics dashboard controlled entirely by speech. The Sierra customer spotlight at 18:36 is the most technically dense part of the video. Their engineering team details VAD tuning, output redaction, custom evaluation harnesses, and tracing in a real production customer experience deployment. That section alone justifies watching.
Code is in the public OpenAI build-hours GitHub repo. The playground is live at platform.openai.com/audio/realtime. The Q&A runs from 29:56 to 42:05 and covers edge cases not addressed in the demos. If you are building voice agents for anything customer-facing, the Sierra guardrails architecture is the specific thing to look at.
[WATCH ON YOUTUBE →]