
Tech • IA • Crypto
NVIDIA has unveiled an integrated stack for “physical AI,” combining a world model, agent-focused CPU, and humanoid robot platform to accelerate real-world autonomous systems.
NVIDIA introduced Cosmos 3, a multimodal foundation model designed to simulate and predict real-world physical interactions. It integrates vision, reasoning, world generation, and action prediction, enabling machines to anticipate outcomes rather than merely describe scenes. The system is trained on massive multimodal datasets, reportedly reaching 20 trillion tokens, including video, images, audio, text, and action trajectories.
Cosmos 3 targets a major bottleneck in robotics: slow and costly training cycles. By generating realistic simulations of motion, force, and interaction, it can reduce training and evaluation timelines from months to days. This approach minimizes hardware damage and safety risks by shifting learning into synthetic environments before real-world deployment.
The development reflects a broader industry pivot toward world models that capture physical causality. Unlike traditional AI trained on internet text, physical AI requires understanding of movement, collisions, and spatial dynamics. Major players including Google DeepMind, OpenAI, Tesla, and others are pursuing similar approaches, signaling an emerging platform competition.
NVIDIA is building an ecosystem around Cosmos through the Cosmos Coalition, with partners such as Agile Robots, Black Forest Labs, Runway, and others. The initiative aims to standardize tools and accelerate adoption of simulation-driven AI development across robotics and autonomous systems.
NVIDIA also launched Vera, described as the first CPU tailored for agentic AI workloads. Unlike GPUs optimized for model training, Vera focuses on coordinating tasks such as tool use, data movement, and workflow execution. The company claims performance gains of up to 1.8× over traditional x86 processors in agent-based workloads.
Agentic systems differ from chatbots by executing multi-step tasks, including running code, querying databases, and interacting with external tools. This shift increases demand on CPUs within data centers, positioning Vera as a key component in what NVIDIA describes as future “AI factories.”
Companies including Anthropic, OpenAI, xAI, Oracle Cloud, and CoreWeave are expected to adopt Vera-based systems. NVIDIA estimates the market opportunity at up to $200 billion, with hardware partners such as Dell, HPE, Lenovo, and Asus preparing large-scale deployments.
To complete the stack, NVIDIA introduced the Isaac Groot reference humanoid robot. Built on a Unitree chassis, the robot stands nearly 6 feet tall, weighs about 150 pounds, and features 75 degrees of freedom, including advanced five-finger tactile hands designed for complex manipulation tasks.
The robot includes stereo vision with a 140° field of view, wrist-mounted cameras, and onboard AI powered by the Jetson AGX Thor system delivering 270 teraflops. This enables onboard perception, planning, and control without relying entirely on remote computation.
NVIDIA’s strategy aims to unify hardware and software through platforms like Jetson, Omniverse, Isaac, and Cosmos. By offering a full-stack reference system, the company seeks to become the default infrastructure layer for robotics, similar to its role in AI computing.
Parallel developments highlight more immediate real-world deployment. Foundation Future Industries has tested humanoid robots in Ukraine for hazardous logistics tasks, such as transporting supplies in dangerous zones. The company has secured a $24 million Pentagon contract and is exploring broader defense applications.
Despite rapid progress, significant barriers persist, including battery life, durability, dexterity, and reliability in unpredictable environments. Experts note a substantial gap between controlled demonstrations and real-world deployment, particularly in high-risk scenarios like combat.
NVIDIA is positioning itself as the foundational platform for physical AI, but as capabilities expand into real-world and military contexts, the technological race is increasingly tied to complex safety, ethical, and geopolitical stakes.
The big bang of AI just happened and NVIDIA is turning it into something much bigger than a model release. A full operating layer for robots that can actually see, think, plan, and move through the real world. Here's how it breaks down. Cosmos 3 is the Spark, a world model designed to simulate physical environments, predict what comes next, and train robots way faster than you ever could in the real world. Then you've got Vera, this new CPU built specifically for AI agents that can run tools, execute tasks, and manage entire workflows. And then NVIDIA ties the whole stack together with a humanoid reference robot, Unitry hardware, Sharpa five-finger hands, Jets, and Thor compute, Isaac Groot platform, the works. And the timing makes this way more serious because humanoid robots are already being field tested in actual war zones by another company. Let's start with Cosmos 3 because this is where you can see Nvidia's bet on where AI goes next. For years, most AI progress has been stuck behind screens. But the real world way harder. A robot needs to understand space, movement, force, timing, friction, how objects interact, and what's going to happen 1 second after it does something. That's what Nvidia is going after with this model. Cosmos 3 is what they're calling an openw world foundation model for physical AI. It's built on a mixture of Transformers architecture. And here's the key part. It combines three things in one system. Vision, reasoning, world generation, and action prediction. In plain English, it can understand what it's seeing, generate or simulate physical scenes, and help figure out what should happen next. This matters because robots and self-driving cars can't just learn from regular internet data. A chatbot can read half the internet and pick up language patterns. But a robot needs something way harder. It needs examples of motion, action sequences, real world cause and effect. It needs to know what happens when a hand reaches for a cup, when a box tips over, when a wheel loses traction, when someone steps into your path, when two objects collide. NVIDIA says Cosmos 3 was trained on one of the largest multimodal physical AI data sets ever. Billions of samples across text, images, video, sound, and action trajectories. Axios reported the training data hit 20 trillion tokens of multimodal data, including real and synthetic video, images, ambient audio, text, and action sequences from both humans and robots. The point is crystal clear. This isn't your standard language model. It's trained around the actual structure of the physical world. The big claim, Cosmos 3 can cut physical AI training and evaluation cycles from months down to days. That's massive for robotics because robot training is painfully slow. You can't just let a humanoid fail a million times in a warehouse or on the street. It breaks hardware, burns time, creates safety nightmares. So companies lean on simulation, synthetic data, controlled environments, teaching robots before they ever touch the real world. Cosmos 3 is NVIDIA's shot at making that process way faster and way more general. The model can work as a vision language model, a world model, and a video foundation model. It can simulate physical environments and predict future states. A regular AI model can describe a scene. A physical AI model needs to understand what's about to happen in that scene and what action makes sense inside it. That's why Jensen Hang called it the big bang of physical AI. He said breakthroughs in multimodal reasoning, language, vision, and world models are making this shift real. And the phrasing matters. Nvidia is not pitching Cosmos 3 as just another model release. They're selling it as the foundation layer for robots, autonomous vehicles, and vision AI systems that can perceive, reason, plan, and act. NVIDIA also launched the Cosmos Coalition, pulling in companies like Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skilled AI. That's important because world models are quickly becoming their own platform war. Open AI, Google DeepMind, Tesla, robotics startups, video model companies, simulation labs, they're all circling the same idea. The next wave of AI needs a model of reality, not just a model of language. And once NVIDIA locks down the world model layer, the next question is obvious. Where does all this actually run? That brings us to Vera. Nvidia is calling Vera the first CPU built for AI agents. Sounds simple, but it's a huge shift in how Nvidia's framing the future of computing. For years, the GPU was the hero of the AI boom. Massive models needed massive parallel compute, and Nvidia became the company everyone had to deal with. Now, the focus is shifting toward agentic AI. And Agentic AI works differently than a normal chatbot. An AI agent doesn't just spit out one answer. It can plan a task, call tools, run code, check files, query databases, use APIs, test outputs, retry failed steps, keep grinding through a workflow. That creates a different kind of load inside data centers. The CPU becomes way more important because agents are constantly coordinating tasks, moving data, managing tool calls, running logic, connecting to everything around the model. NVIDIA says Vera is a high-performance energyefficient CPU designed for workloads like Agentic AI reinforcement learning and data processing. It powers standalone Vera servers, NVIDIA Vera Rubin systems, and Vera Bluefield, four STX AI storage platforms. They're also claiming it can finish diverse agent workloads up to 1.8 times faster than traditional x86 processors. That number matters because it shows how Nvidia sees the future. If AI agents become the workers of the internet, speed doesn't just mean spitting out tokens faster. It means finishing tasks faster, compiling code, running tests, searching data, processing files, executing sandboxed workflows, moving to the next step with way less delay. The adoption list is serious. Nvidia says Anthropic, OpenAI, and XAI are planning to adopt Vera along with Bite Dance, Coreweave, and Oracle Cloud Infrastructure. Reuters reported Jensen Hang describing Vera as a possible $200 billion market with OpenAI, Anthropic, and SpaceX among the major early adopters. Nvidia also named Dell, HPE, Lenovo, Super Micro, Asus, Foxcon, Gigabyte, QCT, Wistron, and others as companies building standalone Vera CPU systems at scale. So, this isn't just another part in a server rack. Nvidia is basically saying the AI factory needs a new CPU because AI agents are about to become one of the main workloads of the future. And the timing makes sense. The whole industry is moving toward agents. Open AAI is building agent tools. Anthropics pushing Claude into coding and computer use workflows. Google's building deeper agent systems around Gemini. XAI's pushing Grock into coding and product workflows. The next battlefield isn't just which model answers better. It's which model can actually do more work. Vera is Nvidia's way of making sure that when that shift happens, the compute layer still runs through them. Then the story goes physical again. Because if Cosmos 3 gives AI a world model and Vera gives AI agents a compute layer, Nvidia's new Isaac Groot reference humanoid gives the whole thing a body. NVIDIA announced the Isaac Groot reference humanoid robot as an open humanoid robot reference design for academic research. The idea is to give researchers a unified hardware and software platform instead of forcing every lab to cobble together its own robot. Hands, sensors, compute, simulation, training, evaluation, and deployment stack from scratch. The robot's built around a unit h two humanoid chassis, stands nearly 6 feet tall, weighs around 150 lb, has 31 degrees of freedom across the body. Nvidia pairs that with dual Sharpa wave tactile five-finger hands, adding 22 degrees of freedom, bringing the full system to 75 degrees of freedom across body and hands. That detail matters because hands are one of the hardest parts of humanoid robotics. Walking gets tons of attention, and it should. Balance is brutally difficult. But real usefulness often comes down to manipulation. A humanoid has to grab objects, hold tools, open doors, lift items, press buttons, work in spaces designed for human bodies. Five finger tactile hands bring the platform way closer to that goal. The sensing stacks also serious. The reference robot includes a headmounted stereo camera with a wide field of view, 140° horizontal, 102° vertical. It's also got wrist cameras for close-range manipulation and an IMU for motion tracking that gives the robot a wider view of the scene, plus way more detailed visual feedback near its hands. The control and payload numbers make it feel less like a research toy. Nvidia lists arm torque up to 120 new m, leg torque up to 360 new m. Rated arm payload is 7 kg, peak payload 15 kg. That means the platform's designed for real manipulation and lifting tests, not just walking around a lab. Then comes the onboard compute. The robot uses Nvidia Jetson AGX Thor T5000 Blackwell GPU delivering 270fp 4 teraflops of AI performance, 14 core ARM CPU, 128 gigs of unified memory, configurable power from 40 to 130 watts. That's what turns the robot into a physical AI platform instead of a remotec controlled shell. Nvidia says leading institutions including AI2, ETH Zurich, Stanford Robotics Center, and UC San Diego's Advanced Robotics and Controls Lab will use the reference design. Reuters also reported Nvidia plans to work with US, European, and South Korean humanoid robot makers in addition to Unitry. That matters because Unitry is based in China and there are already concerns from some US lawmakers about using Unitry systems in federally funded research. Nvidia is trying to position itself as the secure platform layer. Software updates routed through Nvidia chips, protections like secure boot and confidential computing baked in. So the bigger story isn't just that Nvidia showed a humanoid robot. The bigger story is Nvidia wants to standardize the robot development stack the same way it standardized the AI compute stack. If labs and companies build on Jetsen Thor, Isaac Groot, Cosmos, Omniverse, and Nvidia's simulation and deployment tools, then Nvidia becomes the operating layer for physical AI. And that leads into the final part of the story, which feels way darker. While Nvidia is building the official platform for physical AI, Foundation Future Industries is pushing humanoid robots into military and heavy industrial use. Their Phantom Mark1 has already been tested in Ukraine. Reports say two Phantom robots were sent there earlier this year for pilot testing focused on dangerous logistics tasks like supply pickup near hazardous areas. That's a very different kind of humanoid story. Most robotics companies talk about warehouse work, manufacturing, home assistance, general purpose labor. Foundations openly focused on dangerous environments, including conflict zones. Business Insider reported the company tested phantom robots in Ukraine for logistics operations under real war zone conditions with the idea of carrying supplies from outside to inside so soldiers don't have to expose themselves. The company also has a way more aggressive long-term vision. Reports describe Phantom Mark1 as a defense focused humanoid, and Foundation's leadership has discussed future combat roles, including humanoids eventually handling weapons that humans use. At the same time, even the company admits there's a massive gap between a humanoid that can pull off a slow logistics demo and a humanoid that can operate reliably in a real firefight. That gap matters. Battery life's still a problem. Durability is still a problem. water, dust, shock, terrain, manipulation, reliability, cost, all massive barriers. The hardest part might still be the hand because using a weapon, grabbing equipment, opening a door, handling supplies, that requires dexterity that works under pressure, not just in a demo. Business Insider reported Foundation secured a $24 million Pentagon contract and the company believes humanoids could carry out way more complex military missions within 5 to 10 years. That timeline's exactly why this story is unsettling. These systems aren't ready to replace soldiers today. They're also no longer purely science fiction. And once AI can move through the world, the stakes get way higher. What do you think? Is physical AI the next real breakthrough, or are we moving too fast toward machines we barely understand? Drop your thoughts in the comments. Subscribe for more AI and robotics updates. Thanks for watching, and I'll catch you in the next one.