ENFR
8news

Tech • IA • Crypto

TodayBriefingVideosTop 24hCryptoArchivesFavoritesTopics

GPT-5 in voice: the real shock

9/10
AIRenaud DékodeMay 11, 2026 at 05:04 PM2:32
Audio player
0:00 / 0:00

TL;DR

OpenAI’s GPT Real Time 2 introduces low-latency, voice-based AI with reasoning and tool use, signaling a major shift toward conversational interfaces across digital services.

KEY POINTS

Launch of GPT Real Time 2

OpenAI has released GPT Real Time 2, a voice-first model designed for real-time, bidirectional conversations. It integrates reasoning capabilities and can call external tools during speech, enabling more dynamic and interactive exchanges than previous voice systems.

API-first deployment

The model is currently available via API, targeting developers building applications. Consumer-facing chat interfaces have not yet fully integrated this version, indicating a staged rollout focused on enterprise and product integration.

Advanced conversational abilities

GPT Real Time 2 can process queries, pause to “think,” retrieve external data, and respond seamlessly within a single conversation. This enables use cases such as automated customer support agents that access CRM systems and resolve issues in real time.

Companion model: Realtime Whisper

Alongside it, Realtime Whisper offers near-instant speech-to-text transcription across more than a dozen languages. It supports live subtitling and multilingual captioning with minimal latency, significantly improving accessibility and live communication workflows.

Companion model: Realtime Translate

Another specialized model, Realtime Translate, functions as a live interpreter. It overlays translated speech in real time without tool integration, enabling fluid multilingual conversations across languages such as Japanese or German.

Multimodal input capabilities

GPT Real Time 2 can also process visual input, allowing it to describe scenes or objects. This opens accessibility use cases, particularly for visually impaired users who can receive real-time audio descriptions of their surroundings.

Cost and scalability

The system is positioned as relatively low-cost, encouraging widespread adoption across industries. Its affordability and flexibility are expected to accelerate integration into apps, services, and connected devices.

Potential industry impact

The combination of voice interaction, reasoning, and tool use could reshape customer service, translation, accessibility, and digital interfaces broadly. Voice may become a primary interface for interacting with software and online services.

CONCLUSION

GPT Real Time 2 and its companion models mark a shift toward voice-driven, intelligent interfaces that combine reasoning, real-time processing, and multimodal input, with broad implications for how people interact with technology.

Full transcript

More from AI