ENFR

Tech • IA • Crypto

Today All videos Video recaps All topics Top articles 24h Archives

OpenAI Just Dropped the Biggest Voice AI Upgrade Yet

AIAI RevolutionMay 8, 202615:49

0:00 / 0:00

TL;DR

OpenAI has launched real-time voice AI models alongside a new supercomputing network system, highlighting rapid product advances amid a still-uncertain impact on jobs.

KEY POINTS

New real-time voice models

OpenAI introduced three developer-facing systems: GPT Realtime 2, GPT Realtime Translate, and GPT Realtime Whisper. The goal is to make voice assistants behave less like scripted menus and more like responsive agents capable of understanding intent and completing tasks during live conversations. These models are designed to handle interruptions, corrections, and complex multi-step requests in real time.

Improved reasoning and multitasking

GPT Realtime 2 integrates near GPT-5–level reasoning into spoken interactions. It can call multiple tools simultaneously, track long conversational context, and respond while processing background actions. The context window has expanded to 128,000 tokens, enabling longer, more coherent exchanges across use cases such as customer support, tutoring, and medical workflows.

Performance gains and adaptability

Benchmarks show major improvements, with accuracy rising to 96.6% on Big Bench Audio, up from 81.4% in prior versions. Developers can tune reasoning intensity across five levels, balancing speed and depth. The system also adapts tone—such as calm or upbeat—and handles specialized terminology more effectively.

Live translation and transcription

GPT Realtime Translate supports over 70 input languages and produces output in 13 languages, enabling near-instant multilingual conversations. Early testing includes Deutsche Telekom using it for customer support. Meanwhile, GPT Realtime Whisper provides live transcription, powering captions, summaries, and automated workflows during meetings and events.

Voice as an action interface

OpenAI outlined three emerging patterns: “voice-to-action,” where spoken commands trigger real-world tasks; “systems-to-voice,” where software communicates live updates; and “voice-to-voice,” enabling seamless multilingual communication. Pricing has been disclosed, with translation at $0.034 per minute and transcription at $0.017 per minute, signaling readiness for commercial deployment.

Hidden infrastructure: MRC networking

Behind these models is a new networking protocol, Multi-Path Reliable Connection (MRC), designed for massive AI supercomputers. Developed with partners including Microsoft, Nvidia, AMD, and Intel, MRC optimizes how thousands of GPUs exchange data, reducing bottlenecks and preventing costly slowdowns during training.

Efficiency and resilience at scale

MRC distributes data across multiple paths instead of relying on single routes, improving throughput and reliability. It can reroute around failures in microseconds and maintain operations even during hardware disruptions. The system enables clusters of up to 131,000 GPUs with fewer switches, cutting infrastructure costs while improving speed.

AI job impact remains unclear

Despite rapid technological progress, evidence of widespread job displacement is mixed. Surveys from the National Bureau of Economic Research show nearly 90% of executives reporting no workforce impact so far, and labor data shows limited macroeconomic change through early 2026. However, projections vary widely, with some leaders warning of significant reductions in entry-level roles.

Early signs of workforce shifts

Research indicates a 13% decline in employment for early-career workers in AI-exposed roles, while experienced workers remain stable. At the same time, companies have cited AI in layoffs, though some executives acknowledge this may mask broader economic pressures rather than direct automation effects.

CONCLUSION

OpenAI’s latest releases highlight both the rapid evolution of real-time AI capabilities and the growing importance of underlying infrastructure, while the broader economic impact on jobs remains uncertain and uneven.

Full transcript

OpenAI just launched voice AI that can talk, translate, transcribe, and take action in real time. At the same time, it revealed the hidden supercomputer tech powering this race. While Sam Alman admitted some companies are blaming layoffs on AI, even when that may not be the real reason. Let's start with the voice update because people will probably notice this first. Open AAI introduced three new realtime audio models for developers. GPT Realtime 2, GPT Realtime Translate, and GPT Realtime Whisper. The simple way to understand this is that OpenAI wants voice AI to stop feeling like a phone menu and start feeling more like a real assistant that can understand what you mean and help you get something done. For years, voice assistants had the same problem. They could respond quickly, and sometimes they sounded natural, but they were usually much weaker than the best textbased AI models. They could answer simple questions or give basic replies then fall apart once the request became complicated. That is what OpenAI is trying to fix with GPT realtime 2. This model is built for live spoken conversations but with GPT5 class reasoning behind it in normal language. That means it should be much better at understanding complicated requests while the conversation is still happening. It can keep track of context, respond to corrections, call tools, and handle more than one action at the same time. >> Hey, Penny, I'm in Atlanta, and I just got an email that my flight to New York tomorrow was cancelled. Can you find me the next available under $400? >> I'm really sorry your flight got cancelled. I'm excited to help you find a solid option under $400 for tomorrow. The cheapest is about $245. >> Hold on. Now, a real voice agent becomes useful when it can check your account, look up a shipment, compare options, issue a refund, create a replacement order, and explain everything back while you are still talking. GPT Realtime 2 can call multiple tools in parallel. So, instead of checking one thing, waiting, checking another thing, and then slowly building a response, it can work across several steps at once. While it does that, it can use short spoken phrases like, "Let me check that." for one moment. That sounds small, but in voice it matters. When a human pauses during a call, we understand they are checking something. When an AI goes silent, it feels broken. The context window also jumps from 32,000 tokens to 128,000 tokens. That means it can follow much longer conversations which matters for support calls, tutoring, meetings, interviews, medical workflows, and anything where the AI needs to remember earlier context. OpenAI is also giving developers control over reasoning intensity with five levels: minimal, low, medium, high, and XH high. The default is low because most voice interactions need to be fast. For harder tasks, developers can turn the reasoning up and let the model spend more compute. OpenAI also gave benchmark numbers on Big Bench Audio. GPT Realtime 2 at the high setting reached 96.6% accuracy compared with 81.4% for GPT Realtime 1.5. On audio multi-challenge, which tests instruction following across multi-turn dialogue, the X high version reached a 48.5% average pass rate compared with 34.7% before. For normal users, the bigger story is that voice AI is becoming more useful in messy, real situations. People interrupt, change their mind, use accents, proper names, medical terms, and half-finish sentences. OpenAI says the new model handles specialized terminology better and its tone can be made calm, empathetic or upbeat depending on the situation. Now that is only the first model. The second one is GPT realtime translate made for live translation. OpenAI says it can understand more than 70 input languages and speak back in 13 output languages. The goal is not just to translate words one by one. The goal is to keep up with the speaker, preserve the meaning, handle regional accents, and understand context switches. What's really impressive is that the model can listen to me and translate while I'm speaking. That could be big for customer support, international sales, online education, live events, media, and creator platforms. Deutsche Telecom is already testing this for customer support where two people could speak different languages and still hold a live conversation through AI. Then there is GPT realtime whisper. This one is for streaming transcription, meaning it turns speech into text as it happens. The obvious uses are live captions for meetings, classrooms, broadcasts, conferences, and events. It can also help generate notes, summaries, action items, and follow-up workflows while the conversation is still happening. OpenAI described three big patterns here. Voice to action means you say what you want and the AI uses tools to get it done. Systems to voice means software turns live context into spoken guidance. A travel app could tell you your delayed flight connection is still possible, send you toward the right gate, and confirm what is happening with your luggage. Voicetov voice means AI helps people speak across language barriers in real time. The pricing is already public too. GPT real time 2 costs $32 per million audio input tokens with cached input at 40 per million tokens and $64 per million audio output tokens. GPT realtime translate costs 0.034 per minute and GPT realtime whisper costs 0.017 per minute. All three are available through the real-time API and can be tested in the playground. The API also supports EU data residency for EU based apps and is covered by OpenAI's enterprise privacy commitments. And since real-time voice AI can obviously be misused, OpenAI says it has built guard rails against spam, fraud, and harmful uses. The system can halt conversations that violate harmful content guidelines. And this is where another bottleneck starts showing up because as AI makes it easier to generate code, teams end up spending more time reviewing it. That's where Code Rabbit comes in. Code Rabbit is sponsoring today's video. And unlike tools like Copilot or Cursor that generate code, this one focuses on reviewing it. The moment you open a pull request, Code Rabbit acts as your AI co-pilot for code reviews. Beyond just flagging issues, it provides one-click fix suggestions and lets you define custom code quality rules using a GREP patterns, catching subtle issues that traditional static analysis tools might miss. Over time, it also adapts. The more you interact with it, the more its feedback aligns with your codebase and your team style. One thing that actually stands out is that it does not just point out problems. For many comments, it gives one-click fix suggestions so you can apply changes immediately instead of rewriting everything manually, which makes a big difference when you are dealing with a lot of AI generated code. Also, code rabbit CLI brings instant code reviews directly to your terminal, seamlessly integrating with clawed code, cursor CLI, and other AI coding agents. While they generate code, Code Rabbit ensures it's production ready, catching bugs, security issues, and AI hallucinations before they hit your codebase. Code Rabbit is already used across 3 million plus of repositories and hundreds of thousands of open-source projects, reviewing around a million poll requests every week. If you want to check it out, link is in the description. And now back to the video. Now, while all of this sounds like a product story, there is a deeper infrastructure story underneath it. Every time Open AAI releases a more powerful model, there is a giant machine behind it. And the problem is no longer just buy more GPUs. At this level, the network connecting the GPUs becomes one of the biggest problems. That brings us to MRC, which stands for multiath reliable connection. OpenAI introduced MRC as a new networking protocol for large-scale AI supercomputer training clusters. It was developed over two years with AMD, Broadcom, Intel, Microsoft and Nvidia and published through the open compute project so the broader industry can use it. The simple version is this. When OpenAI trains a frontier AI model, thousands and thousands of GPUs have to work together like one giant machine. They constantly send data back and forth. And one training step can involve millions of tiny transfers. If even one important transfer arrives late, everything can slow down. And when GPUs this expensive sit idle, even for a short time, that is real money being burned. OpenAI says more than 900 million people use chat GPT every week. So, keeping these models improving at that scale takes more than just powerful chips. The whole supercomputer needs to stay fast and stable even when links fail, switches have problems, or traffic builds up inside the system. That is where MRC comes in. It stands for multiath reliable connection. And the easiest way to think about it is as a smarter traffic system for AI supercomputers. Instead of letting data get stuck on one crowded road, MRC can spread it across many different routes at the same time. It is built on networking technology called Rossi and RDMA, which lets machines move data directly between each other without forcing the CPU to handle every little step. It also uses SRV6 routing, which means each packet already carries instructions for the path it should take. The first major benefit is smoother traffic. Traditional systems often send data down one main path, which can create bottlenecks. MRC spreads packets across hundreds of paths, so the load is shared more evenly. That helps the GPUs keep working instead of waiting around for delayed data. The second benefit is fast recovery when something breaks. In normal networks, failures can take seconds or even tens of seconds to settle. In AI training, that can be a serious problem. MRC can detect a bad link, switch, or path and move around it in micros secondsonds. The clever part is that most of the decision-making happens at the network card level, not inside the switches. The switches mostly follow fixed routes instead of constantly recalculating everything. Before MRC, if a key connection between a GPU and a switch failed, the training job could crash. With MRC, the system can keep going. If one port fails on an 8port network interface, capacity drops by 1/8. MRC avoids that failed path, tells the rest of the system to stop using it, and brings it back when it recovers, often within about a minute. The third benefit is scale. MRC lets OpenAI split one huge 800 Gbit per second connection into several smaller links going to different switches. That sounds technical, but the result is simple. Open AAI can connect about 131,000 GPUs with only two layers of switches instead of needing three or four. That saves a lot of hardware. Open AAI says this design uses 2/3 of the optics and three-fifths of the switches compared with a traditional three- tier network. It also reduces delay because data passes through fewer switches on the way to its destination. And this is already being used, not just tested. MRC is running on OpenAI's largest NVIDIA GB 200 supercomputers, including the Oracle Cloud Infrastructure site in Abalene, Texas, and Microsoft's Fairwater Supercomputers in Atlanta and Wisconsin. It works across 4 and800 gigabit RDMA network cards from Nvidia, AMD, and Broadcom with switch support from NVIDIA Spectrum and Broadcom Tomahawk systems. OpenAI says MRC has already helped train Frontier models for chat GPT and codecs. During one recent training run, OpenAI even rebooted four major switches and the training teams did not need to stop or coordinate around it. The system kept going. That is the real point here. Behind every smarter AI model, there is now a huge reliability battle happening deep inside the data center. So, while the public sees voice models and chat bots, this is the part happening behind the curtain. The AI race is also becoming a networking race. And that leads into the third topic, jobs. Sam Alman recently admitted that some companies are doing AI washing, blaming layoffs on AI, even when those cuts would have happened anyway. At the same time, he says real AIdriven job displacement is happening too and will grow. This is why the AI jobs debate is so confusing. The data is messy. A National Bureau of Economic Research study published in February surveyed thousands of seuite executives across the United States, the United Kingdom, Germany, and Australia. Nearly 90% said AI had no impact on workplace employment over the past 3 years after Chat GPT launched in late 2022. Yale Budget Lab also looked at Bureau of Labor Statistics data and found no major evidence yet of AI changing the mix of occupations or the length of unemployment for jobs highly exposed to AI through March 2026. Martha Gimble from Yale said that at this exact moment there do not seem to be major macroeconomic effects. But then you have the other side of the story. Anthropic CEO Dario Amodai has warned that AI could wipe out 50% of entry-level office jobs. Snap CEO Evan Spiegel announced in April that the company would lay off about 1,000 people, around 16% of its workforce, and cited AI. The World Economic Forum's 2025 future of jobs report said around 40% of employers expect to reduce staff because of AI in the future. Some economists think the impact may be starting to show up slowly. Apollo Global Management chief economist Torstston Sllock compared the moment to the old computer era where technology was everywhere but did not immediately show up in productivity data. He said AI is everywhere except in the incoming macroeconomic data. Stanford's Eric Brinolson has a different angle. He pointed to a possible shift where job growth and GDP growth start to separate. He noted revised job gains of 181,000 while fourth quarter GDP was tracking at 3.7%. His own analysis showed a 2.7% year-over-year productivity jump last year, which he linked partly to AI starting to show real benefits. He also published research showing a 13% relative decline in employment for early career workers in jobs highly exposed to AI while more experienced workers stayed stable or even grew. So, the honest answer is messy. Some companies really may be using AI as a cover for layoffs caused by weak margins, cautious consumers, geopolitical pressure, or pressure to justify massive AI spending. But at the same time, AI is becoming capable enough to replace or shrink certain kinds of work, especially entry-level digital tasks. Also, if you want more content around science, space, and advanced tech, we've launched a separate channel for that. Links in the description. Go check it out. Anyway, that's it for this one. Let me know what you think about OpenAI's new voice models, MRC, and the AI jobs debate. Thanks for watching, and I'll catch you in the next one.

More from AI