
Tech • IA • Crypto
NVIDIA a dévoilé une pile intégrée pour « l’IA physique », combinant un modèle du monde, un CPU orienté agents et une plateforme de robot humanoïde afin d’accélérer les systèmes autonomes dans le monde réel.
NVIDIA a présenté Cosmos 3, un modèle fondation multimodal conçu pour simuler et prédire les interactions physiques du monde réel. Il intègre vision, raisonnement, génération de mondes et prédiction d’actions, permettant aux machines d’anticiper les résultats plutôt que de simplement décrire des scènes. Le système est entraîné sur d’énormes ensembles de données multimodales, atteignant environ 20 billions de tokens, incluant vidéo, images, audio, texte et trajectoires d’action.
Cosmos 3 s’attaque à un goulet d’étranglement majeur en robotique: des cycles d’entraînement lents et coûteux. En générant des simulations réalistes de mouvement, de force et d’interaction, il peut réduire les délais d’entraînement et d’évaluation de plusieurs mois à quelques jours. Cette approche limite les dommages matériels et les risques de sécurité en déplaçant l’apprentissage vers des environnements synthétiques avant le déploiement réel.
Ce développement reflète un virage plus large de l’industrie vers les modèles du monde capturant la causalité physique. Contrairement à l’IA traditionnelle entraînée sur du texte internet, l’IA physique nécessite une compréhension du mouvement, des collisions et de la dynamique spatiale. Des acteurs majeurs comme Google DeepMind, OpenAI, Tesla et d’autres poursuivent des approches similaires, signalant une compétition de plateformes émergente.
NVIDIA construit un écosystème autour de Cosmos via la Cosmos Coalition, avec des partenaires tels que Agile Robots, Black Forest Labs, Runway et d’autres. L’initiative vise à standardiser les outils et accélérer l’adoption du développement d’IA basé sur la simulation dans la robotique et les systèmes autonomes.
NVIDIA a également lancé Vera, décrit comme le premier CPU conçu pour les charges de travail d’IA agentique. Contrairement aux GPU optimisés pour l’entraînement des modèles, Vera se concentre sur la coordination de tâches comme l’usage d’outils, le déplacement de données et l’exécution de workflows. L’entreprise revendique des gains de performance jusqu’à 1,8× par rapport aux processeurs x86 traditionnels sur ces charges.
Les systèmes agentiques se distinguent des chatbots en exécutant des tâches multi-étapes, comme lancer du code, interroger des bases de données et interagir avec des outils externes. Cette évolution augmente la demande sur les CPU dans les centres de données, positionnant Vera comme un élément clé de ce que NVIDIA appelle les « usines d’IA » du futur.
Des entreprises comme Anthropic, OpenAI, xAI, Oracle Cloud et CoreWeave devraient adopter des systèmes basés sur Vera. NVIDIA estime l’opportunité de marché jusqu’à 200 milliards de dollars, avec des partenaires matériels comme Dell, HPE, Lenovo et Asus préparant des déploiements à grande échelle.
Pour compléter la pile, NVIDIA a introduit le robot humanoïde de référence Isaac Groot. Construit sur un châssis Unitree, le robot mesure près de 1,80 m, pèse environ 150 livres et dispose de 75 degrés de liberté, incluant des mains tactiles avancées à cinq doigts pour des tâches de manipulation complexes.
Le robot intègre une vision stéréo avec un champ de vision de 140°, des caméras montées aux poignets et une IA embarquée alimentée par le système Jetson AGX Thor délivrant 270 téraflops. Cela permet perception, planification et contrôle en local sans dépendre entièrement de calculs distants.
La stratégie de NVIDIA vise à unifier matériel et logiciel via des plateformes comme Jetson, Omniverse, Isaac et Cosmos. En proposant un système de référence complet, l’entreprise cherche à devenir la couche d’infrastructure par défaut pour la robotique, à l’image de son rôle dans le calcul IA.
Des développements parallèles montrent des déploiements plus immédiats. Foundation Future Industries a testé des robots humanoïdes en Ukraine pour des tâches logistiques dangereuses, comme le transport de fournitures en zones à risque. L’entreprise a obtenu un contrat du Pentagone de 24 millions de dollars et explore des applications de défense plus larges.
Malgré les progrès rapides, des obstacles importants subsistent: autonomie des batteries, durabilité, dextérité et fiabilité dans des environnements imprévisibles. Les experts soulignent un écart notable entre démonstrations contrôlées et déploiements réels, notamment dans des contextes à haut risque comme le combat.
NVIDIA se positionne comme la plateforme de base de l’IA physique, mais à mesure que ces technologies s’étendent au monde réel et militaire, la course technologique s’accompagne d’enjeux complexes de sécurité, d’éthique et de géopolitique.
The big bang of AI just happened and NVIDIA is turning it into something much bigger than a model release. A full operating layer for robots that can actually see, think, plan, and move through the real world. Here's how it breaks down. Cosmos 3 is the Spark, a world model designed to simulate physical environments, predict what comes next, and train robots way faster than you ever could in the real world. Then you've got Vera, this new CPU built specifically for AI agents that can run tools, execute tasks, and manage entire workflows. And then NVIDIA ties the whole stack together with a humanoid reference robot, Unitry hardware, Sharpa five-finger hands, Jets, and Thor compute, Isaac Groot platform, the works. And the timing makes this way more serious because humanoid robots are already being field tested in actual war zones by another company. Let's start with Cosmos 3 because this is where you can see Nvidia's bet on where AI goes next. For years, most AI progress has been stuck behind screens. But the real world way harder. A robot needs to understand space, movement, force, timing, friction, how objects interact, and what's going to happen 1 second after it does something. That's what Nvidia is going after with this model. Cosmos 3 is what they're calling an openw world foundation model for physical AI. It's built on a mixture of Transformers architecture. And here's the key part. It combines three things in one system. Vision, reasoning, world generation, and action prediction. In plain English, it can understand what it's seeing, generate or simulate physical scenes, and help figure out what should happen next. This matters because robots and self-driving cars can't just learn from regular internet data. A chatbot can read half the internet and pick up language patterns. But a robot needs something way harder. It needs examples of motion, action sequences, real world cause and effect. It needs to know what happens when a hand reaches for a cup, when a box tips over, when a wheel loses traction, when someone steps into your path, when two objects collide. NVIDIA says Cosmos 3 was trained on one of the largest multimodal physical AI data sets ever. Billions of samples across text, images, video, sound, and action trajectories. Axios reported the training data hit 20 trillion tokens of multimodal data, including real and synthetic video, images, ambient audio, text, and action sequences from both humans and robots. The point is crystal clear. This isn't your standard language model. It's trained around the actual structure of the physical world. The big claim, Cosmos 3 can cut physical AI training and evaluation cycles from months down to days. That's massive for robotics because robot training is painfully slow. You can't just let a humanoid fail a million times in a warehouse or on the street. It breaks hardware, burns time, creates safety nightmares. So companies lean on simulation, synthetic data, controlled environments, teaching robots before they ever touch the real world. Cosmos 3 is NVIDIA's shot at making that process way faster and way more general. The model can work as a vision language model, a world model, and a video foundation model. It can simulate physical environments and predict future states. A regular AI model can describe a scene. A physical AI model needs to understand what's about to happen in that scene and what action makes sense inside it. That's why Jensen Hang called it the big bang of physical AI. He said breakthroughs in multimodal reasoning, language, vision, and world models are making this shift real. And the phrasing matters. Nvidia is not pitching Cosmos 3 as just another model release. They're selling it as the foundation layer for robots, autonomous vehicles, and vision AI systems that can perceive, reason, plan, and act. NVIDIA also launched the Cosmos Coalition, pulling in companies like Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skilled AI. That's important because world models are quickly becoming their own platform war. Open AI, Google DeepMind, Tesla, robotics startups, video model companies, simulation labs, they're all circling the same idea. The next wave of AI needs a model of reality, not just a model of language. And once NVIDIA locks down the world model layer, the next question is obvious. Where does all this actually run? That brings us to Vera. Nvidia is calling Vera the first CPU built for AI agents. Sounds simple, but it's a huge shift in how Nvidia's framing the future of computing. For years, the GPU was the hero of the AI boom. Massive models needed massive parallel compute, and Nvidia became the company everyone had to deal with. Now, the focus is shifting toward agentic AI. And Agentic AI works differently than a normal chatbot. An AI agent doesn't just spit out one answer. It can plan a task, call tools, run code, check files, query databases, use APIs, test outputs, retry failed steps, keep grinding through a workflow. That creates a different kind of load inside data centers. The CPU becomes way more important because agents are constantly coordinating tasks, moving data, managing tool calls, running logic, connecting to everything around the model. NVIDIA says Vera is a high-performance energyefficient CPU designed for workloads like Agentic AI reinforcement learning and data processing. It powers standalone Vera servers, NVIDIA Vera Rubin systems, and Vera Bluefield, four STX AI storage platforms. They're also claiming it can finish diverse agent workloads up to 1.8 times faster than traditional x86 processors. That number matters because it shows how Nvidia sees the future. If AI agents become the workers of the internet, speed doesn't just mean spitting out tokens faster. It means finishing tasks faster, compiling code, running tests, searching data, processing files, executing sandboxed workflows, moving to the next step with way less delay. The adoption list is serious. Nvidia says Anthropic, OpenAI, and XAI are planning to adopt Vera along with Bite Dance, Coreweave, and Oracle Cloud Infrastructure. Reuters reported Jensen Hang describing Vera as a possible $200 billion market with OpenAI, Anthropic, and SpaceX among the major early adopters. Nvidia also named Dell, HPE, Lenovo, Super Micro, Asus, Foxcon, Gigabyte, QCT, Wistron, and others as companies building standalone Vera CPU systems at scale. So, this isn't just another part in a server rack. Nvidia is basically saying the AI factory needs a new CPU because AI agents are about to become one of the main workloads of the future. And the timing makes sense. The whole industry is moving toward agents. Open AAI is building agent tools. Anthropics pushing Claude into coding and computer use workflows. Google's building deeper agent systems around Gemini. XAI's pushing Grock into coding and product workflows. The next battlefield isn't just which model answers better. It's which model can actually do more work. Vera is Nvidia's way of making sure that when that shift happens, the compute layer still runs through them. Then the story goes physical again. Because if Cosmos 3 gives AI a world model and Vera gives AI agents a compute layer, Nvidia's new Isaac Groot reference humanoid gives the whole thing a body. NVIDIA announced the Isaac Groot reference humanoid robot as an open humanoid robot reference design for academic research. The idea is to give researchers a unified hardware and software platform instead of forcing every lab to cobble together its own robot. Hands, sensors, compute, simulation, training, evaluation, and deployment stack from scratch. The robot's built around a unit h two humanoid chassis, stands nearly 6 feet tall, weighs around 150 lb, has 31 degrees of freedom across the body. Nvidia pairs that with dual Sharpa wave tactile five-finger hands, adding 22 degrees of freedom, bringing the full system to 75 degrees of freedom across body and hands. That detail matters because hands are one of the hardest parts of humanoid robotics. Walking gets tons of attention, and it should. Balance is brutally difficult. But real usefulness often comes down to manipulation. A humanoid has to grab objects, hold tools, open doors, lift items, press buttons, work in spaces designed for human bodies. Five finger tactile hands bring the platform way closer to that goal. The sensing stacks also serious. The reference robot includes a headmounted stereo camera with a wide field of view, 140° horizontal, 102° vertical. It's also got wrist cameras for close-range manipulation and an IMU for motion tracking that gives the robot a wider view of the scene, plus way more detailed visual feedback near its hands. The control and payload numbers make it feel less like a research toy. Nvidia lists arm torque up to 120 new m, leg torque up to 360 new m. Rated arm payload is 7 kg, peak payload 15 kg. That means the platform's designed for real manipulation and lifting tests, not just walking around a lab. Then comes the onboard compute. The robot uses Nvidia Jetson AGX Thor T5000 Blackwell GPU delivering 270fp 4 teraflops of AI performance, 14 core ARM CPU, 128 gigs of unified memory, configurable power from 40 to 130 watts. That's what turns the robot into a physical AI platform instead of a remotec controlled shell. Nvidia says leading institutions including AI2, ETH Zurich, Stanford Robotics Center, and UC San Diego's Advanced Robotics and Controls Lab will use the reference design. Reuters also reported Nvidia plans to work with US, European, and South Korean humanoid robot makers in addition to Unitry. That matters because Unitry is based in China and there are already concerns from some US lawmakers about using Unitry systems in federally funded research. Nvidia is trying to position itself as the secure platform layer. Software updates routed through Nvidia chips, protections like secure boot and confidential computing baked in. So the bigger story isn't just that Nvidia showed a humanoid robot. The bigger story is Nvidia wants to standardize the robot development stack the same way it standardized the AI compute stack. If labs and companies build on Jetsen Thor, Isaac Groot, Cosmos, Omniverse, and Nvidia's simulation and deployment tools, then Nvidia becomes the operating layer for physical AI. And that leads into the final part of the story, which feels way darker. While Nvidia is building the official platform for physical AI, Foundation Future Industries is pushing humanoid robots into military and heavy industrial use. Their Phantom Mark1 has already been tested in Ukraine. Reports say two Phantom robots were sent there earlier this year for pilot testing focused on dangerous logistics tasks like supply pickup near hazardous areas. That's a very different kind of humanoid story. Most robotics companies talk about warehouse work, manufacturing, home assistance, general purpose labor. Foundations openly focused on dangerous environments, including conflict zones. Business Insider reported the company tested phantom robots in Ukraine for logistics operations under real war zone conditions with the idea of carrying supplies from outside to inside so soldiers don't have to expose themselves. The company also has a way more aggressive long-term vision. Reports describe Phantom Mark1 as a defense focused humanoid, and Foundation's leadership has discussed future combat roles, including humanoids eventually handling weapons that humans use. At the same time, even the company admits there's a massive gap between a humanoid that can pull off a slow logistics demo and a humanoid that can operate reliably in a real firefight. That gap matters. Battery life's still a problem. Durability is still a problem. water, dust, shock, terrain, manipulation, reliability, cost, all massive barriers. The hardest part might still be the hand because using a weapon, grabbing equipment, opening a door, handling supplies, that requires dexterity that works under pressure, not just in a demo. Business Insider reported Foundation secured a $24 million Pentagon contract and the company believes humanoids could carry out way more complex military missions within 5 to 10 years. That timeline's exactly why this story is unsettling. These systems aren't ready to replace soldiers today. They're also no longer purely science fiction. And once AI can move through the world, the stakes get way higher. What do you think? Is physical AI the next real breakthrough, or are we moving too fast toward machines we barely understand? Drop your thoughts in the comments. Subscribe for more AI and robotics updates. Thanks for watching, and I'll catch you in the next one.