Open AI has introduced three new real-time audio models to your API: GPT-Realtime-2, GPT-Realtime-Translation, and GPT-Whisper in real time. These models can now be accessed in the Realtime API and Playground, allowing developers to embed them into existing applications via Codex.
The new tools expand voice capabilities from basic turn-based interactions to include real-time reasoning, multi-language translation, and live transcription.
New OpenAI real-time audio models: GPT-Realtime-2, Translate and Whisper
GPT-realtime-2 is OpenAI’s first live speech model with reasoning capabilities comparable to GPT-5. It is designed to handle complex requests, call tools, and recover from interruptions during ongoing conversations. Key updates to GPT-Realtime-1.5 include adjustable reasoning effort with minimum, low, medium, high, and very high settings, with low as the default.
Its context window has been expanded since 32,000 to 128,000 chipssupporting longer workflows. The model can call multiple tools in parallel, providing audible status updates, such as “check your calendar” or “find it now.” It also includes preambles that allow you to say short phrases like “let me check that” before completing a request.
Improvements have been made in their understanding of domain-specific vocabulary, including proper nouns and healthcare terminology. Additionally, the model offers a more controllable tone and delivery.
GPT-Real Time Translation offers live translation from over 70 input languages to 13 output languages, keeping pace with the speaker. It is designed for use in cross-border customer support, live events, educational platforms, and tools for creators serving global audiences. Deutsche Telekom is testing the multilingual customer service model, while Vimeo is experimenting with translating product education videos in real time as they are played.
GPT-Whisper in real time is a speech-to-text transmission model designed for low-latency transcription. It transcribes audio as you speak, making it suitable for applications such as live captioning, meeting notes that update during conversations, voice assistants that require continuous understanding, and post-call workflows in industries such as customer service, healthcare, and sales.
OpenAI Real-Time Audio API Pricing, Security, and Compliance
Pricing details include several options:
GPT-realtime-2the cost is $32 per million audio input tokens, $0.40 per million cached input tokens, and $64 per million audio output tokens.
GPT-Real Time Translation charges $0.034 per minute.
GPT-Whisper in real time costs $0.017 per minute.
The real-time API features active classifiers that can stop conversations that violate OpenAI content policies. Developers can improve security by adding additional guardrails using the Agent SDK. The API also supports EU data residency for EU-based applications and complies with OpenAI enterprise privacy standards.
According to OpenAI’s usage policies, developers must inform users when they interact with AI, unless the context clearly indicates it.






