ENFR
8news

Tech • IA • Crypto

Aujourd'huiMa veilleVidéosTop articles 24hArchivesFavorisMes topics

Les nouveaux Omni et Spark de Google viennent de changer l’IA pour toujours

IAAI Revolution21 mai 2026 à 00:3118:12
Lecteur audio
0:00 / 0:00

INTRO

Google I/O 2026 a mis en avant une montée en puissance rapide de l’IA, de nouveaux modèles Gemini et un virage vers des agents autonomes intégrés à tous les produits et à l’infrastructure.

POINTS CLÉS

Croissance explosive de l’usage de l’IA

Google a indiqué traiter plus de 3,2 quadrillions de tokens par mois, contre 480 trillions un an plus tôt et 9,7 trillions il y a deux ans. L’application Gemini a dépassé 900 millions d’utilisateurs mensuels, soit plus du double sur un an, tandis que les fonctionnalités de recherche dopées à l’IA touchent des milliards d’utilisateurs. Cela marque le passage d’une IA expérimentale à une infrastructure mondiale du quotidien.

Gemini 3.5 Flash défie les meilleurs modèles

Le nouveau Gemini 3.5 Flash surpasse d’anciens modèles phares sur plusieurs benchmarks, dont 76,2 % sur Terminal Bench 2.1 et 1 656 ELO sur GDP Val AA. Il rivalise avec des systèmes comme GPT-5.5 et Claude Opus 4.7, tout en atteignant environ 280 tokens par seconde, soit près de quatre fois plus rapide que ses concurrents. Google le positionne comme performant et économique.

Réduction majeure des coûts pour les entreprises

Google affirme que Flash offre des capacités similaires à moins de la moitié du prix des modèles concurrents de pointe. Les grands utilisateurs pourraient économiser plus d’un milliard de dollars par an, soulignant l’importance croissante de l’efficacité à mesure que l’IA se généralise.

Introduction du modèle du monde Gemini Omni

Gemini Omni représente une avancée vers l’intelligence artificielle générale, combinant compréhension du texte, de l’audio, de l’image et de la vidéo dans un seul système. Contrairement aux générateurs classiques, il modélise la cohérence physique, permettant des rendus réalistes comme des animations précises du repliement des protéines et des scènes audiovisuelles synchronisées.

Montage et génération vidéo avancés

Omni permet un montage itératif piloté par conversation, où les scènes conservent continuité, physique et cohérence des personnages. Les démonstrations incluaient la transformation d’objets, la modification d’environnements et la génération de séquences multimédias structurées avec audio et visuels cohérents.

Extension de la sécurité IA et du marquage

Tous les contenus générés incluent le watermark SynthID, désormais appliqué à plus de 100 milliards d’images et vidéos et 60 000 ans d’audio. L’adoption par des entreprises comme OpenAI, Nvidia et ElevenLabs indique une évolution vers un standard de transparence à l’échelle du secteur.

Infrastructure TPU de nouvelle génération

Google a dévoilé des TPU de huitième génération, dont TPU8T pour l’entraînement et TPU8 pour l’inférence. L’entraînement peut désormais s’étendre à plus d’un million de TPU, réduisant les cycles de développement de mois à semaines. L’efficacité atteint 2x de performance par watt, avec des gains importants de latence.

Investissement massif en capital

Les dépenses annuelles en capital sont estimées à 180–190 milliards de dollars, contre 31 milliards en 2022, ce qui souligne l’ampleur de l’infrastructure nécessaire pour soutenir la croissance de l’IA.

Montée des plateformes d’agents autonomes

La plateforme Antigravity 2.0 devient un écosystème complet pour créer et orchestrer des agents IA. Associés à Gemini 3.5, ces agents peuvent exécuter des workflows complexes, automatiser des tâches de développement et opérer dans divers environnements via API et SDK avec une configuration minimale.

Améliorations de l’écosystème développeur

Google AI Studio prend désormais en charge le développement d’applications full-stack, l’intégration Kotlin et le déploiement direct. Des outils comme les agents de migration Android convertissent des apps en quelques heures, tandis que WebMCP vise à standardiser les interactions des agents web avec les outils en ligne.

Gemini Spark et agents personnels

Gemini Spark introduit des agents persistants dans le cloud, actifs en continu pour gérer des tâches comme la planification, la recherche et la communication. Il s’intègre aux services Google et à des outils tiers, illustrant le passage vers des assistants numériques toujours actifs.

IA intégrée aux produits grand public

Les nouveautés incluent Docs Live pour la création de documents à la voix, Ask YouTube pour naviguer dans les vidéos avec contexte, Daily Brief pour des résumés personnalisés, et des interactions enrichies dans Google Maps. La recherche évolue vers une interface dynamique orientée tâches avec des sorties interactives.

Nouvelles initiatives créatives et matérielles

Des outils comme Google Pix permettent l’édition d’images au niveau des objets, tandis que des lunettes alimentées par l’IA—développées avec Warby Parker et Gentle Monster—offrent assistance en temps réel, traduction et capture multimédia dans des appareils portables.

CONCLUSION

Les annonces de Google montrent un basculement vers des modèles rapides, économiques et des agents autonomes intégrés partout, marquant le passage d’outils passifs à des systèmes capables d’exécuter activement des tâches.

Transcription complète

All right, so Google just dropped a massive bomb at IO 2026. And honestly, there's so much to unpack here that I barely know where to start. But let me try to walk you through everything because this is genuinely one of the biggest AI announcements we've seen in a while. First off, let's talk numbers because they're kind of insane. 2 years ago, Google was processing about 9.7 trillion tokens per month across all their services. Last year at IO, that jumped to 480 trillion. Now, they're at over 3.2 quadrillion tokens per month. That's a seven times year-over-year increase, which is absolutely wild when you think about what that represents in terms of actual usage and real world problems being solved. And speaking of usage, the Gemini app itself has just exploded. Last year, it had 400 million monthly active users. Now it's sitting at over 900 million, more than doubling in just 12 months. Daily requests have grown over seven times in that same period. AI overviews in search now has 2.5 billion monthly active users. And the new AI mode in search has already crossed 1 billion monthly active users in just a year. These aren't small hobby projects anymore. This is mainstream adoption happening right in front of us. But let's get into the actual announcements. Starting with the models themselves, Google unveiled Gemini 3.5, and the first one out of the gate is Gemini 3.5 Flash. Now, what's really interesting here is that Flash is no longer just the budget option. This thing is punching way above its weight class. On the Terminal Bench 2.1 coding benchmark, it's scoring 76.2% compared to Gemini 3.1 Pros 70.3%. It's getting 1,656 ELO on GDP Val AA versus 3.1 pros, 1,314 and 83.6% on MCP Atlas compared to 78.2%. On the Charsiv reasoning benchmark, it's hitting 84.2%. What's even more impressive is that it's competing with and sometimes beating the flagship models from OpenAI and Anthropic. We're talking about GPT 5.5 and Claude Opus 4.7 here. And here's the kicker. It's doing this at four times the output speed of those other Frontier models. Artificial analysis puts it at close to 280 tokens per second versus around 60 or 70 for GPT 5.5 and Opus 4.7. So, you're getting Frontier level intelligence, but at speeds that used to be reserved for much smaller models. And the pricing, oh man, the pricing is where this gets really interesting. Sundar Pichai mentioned during the keynote that flash delivers Frontier level capabilities at less than half the price of comparable Frontier models, sometimes nearly a third of the price. He gave this example where if top companies processing about a trillion tokens a day shifted 80% of their workloads from other frontier models to 3.5 flash, they'd save over a billion dollars annually. That's not pocket change. That's real money they can reinvest back into their business. Now, Gemini 3.5 Pro is also in the works and should be launching next month. Google's already using it internally and they're saying it's showing great improvements. So, that's something to watch out for. But then we get to Gemini Omni and this is where things get really next level. Google's calling this a world model and Demis Hassabis from DeepMind described it as a pivotal step toward artificial general intelligence. The first model in this family is Gemini Omni Flash and it's fundamentally different from typical texttovideo models. Unlike most video generation tools that just stitch things together, Omni is truly multimodal in both input and output. You can feed it text, audio, images, and video all at once, and it'll generate realistic, scientifically accurate content that actually makes sense. The example they showed with protein folding was pretty compelling. A smooth stop motion sequence of amino acid chains twisting into alpha helyses and beta sheets with properly synced voiceover narration. The interesting part is Google IO wasn't really just about Google. It was about a much bigger shift happening across AI. Every major lab is racing toward the same thing. AI that doesn't just respond, it actually helps you get work done. Google has Gemini and Spark. OpenAI has its agent tools. Anthropic has Claude, Artifacts, Connectors, and some of the most practical workflows you can use right now. So, if you're watching all of this and thinking, "Okay, but how do I actually learn to use AI like that in my own work?" This is exactly where today's sponsor comes in. The world's first Claudathon is happening this weekend from 10:00 a.m. to 7:00 p.m. Eastern. It's a deep dive into Claude, its real use cases, and more than 10 other AI tools, and they have only 1,000 free seats available for a limited time. Millions of people across the world have already attended this workshop and it has a 4.9 out of five rating on Trustpilot. Inside the workshop, you'll learn how to do deep research with Claude, build your own artifacts and dashboards, create full presentations, set up Claude connectors like Indeed to automate your job search, master more than 10 AI tools, build custom GPTs and agents, and generate visuals and videos with AI. And if you sign up now, you also get 50 secret claude codes, an AI prompt library, and a personalized AI toolkit builder for free. You'll be mentored by leaders from Microsoft, Google, Amazon, and Nvidia. So, check the link in the description, scan the QR code on screen, and join the WhatsApp community before the free seats close. All right, now back to Google IO. What really sets Omni apart is that it's trained on all four data types simultaneously. So it actually understands relationships between them. A marble rolling down a track follows gravity correctly. A harp string plucked by a leaf produces the right sound at the right time. The physics actually hold up, which is something a lot of generative video models struggle with. And the editing capabilities are pretty wild, too. You can make iterative changes through natural language conversation. Every instruction builds on the last one. Your characters stay consistent. The physics remain coherent and the scene remembers what came before. You can take a video you shot and just ask Omni to change what's happening, add new characters or objects, transform specific elements, all without losing the thread of your original scene. They showed examples of someone turning a sculpture into bubbles, making a mirror ripple like liquid when touched, and even creating a rapidfire alphabet video where each letter is represented by an unusual object. All with proper lower thirds and smooth music. The level of control and coherence is honestly impressive. Now, Gemini Omniflash is rolling out today to Google AI Plus, Pro, and Ultra subscribers in the Gemini app and Google Flow. It's also coming to YouTube Shorts and the YouTube Create app at no cost later this week, and developers will get API access in the coming weeks. Google's planning to expand it to support image and audio outputs down the line as well. One thing they're being careful about is deep fakes and misuse. All videos created with Omni include Google Synth ID watermark, which is imperceptible but verifiable through the Gemini app, Gemini and Chrome, and Google Search. They're also being conservative with voice cloning, initially only letting you create videos with your own voice using their avatars feature for editing existing videos to change audio and speech. They say they're still testing to figure out how to bring that capability responsibly. And speaking of Synth ID, Google announced some major partnerships there. Synth ID has now watermarked over a 100red billion images and videos along with 60,000 years of audio assets. They're expanding content credentials verification to search and chrome. And they got OpenAI, Cacao, and 11 Labs to adopt Synth ID as well. Nvidia signed on last year. So this is becoming a real crossindustry standard for AI transparency. Then we get to the infrastructure side and this is where Google's really flexing. They announced their eighth generation TPUs and for the first time they're taking a dual chip approach with specialized architectures. There's TPU8T optimized for training and TPU8 optimized for inference. TPU8T has nearly three times the raw computing power of the previous generation. But what's really crazy is how they're doing training now. With Jacks and Pathways, their training is no longer constrained to a single massive data center. They can seamlessly distribute training across multiple sites, scaling across more than a million TPUs globally. This gives them the ability to create the largest training cluster in the world, which means training larger, more capable models in weeks rather than months. TPU8 is all about speed. They've dramatically improved latency at every step because, as they learned from 27 years of working on search, latency matters. Both chips are also more energyefficient, delivering up to two times better performance per watt. All of this infrastructure investment is pretty staggering. In 2022, Google was spending $ 31 billion annually in capex. This year, they expect that number to be around 180 to 190 billion. That's roughly six times what they were spending just a few years ago. Now, let's talk about anti-gravity because this is where a lot of the agentic magic is happening. Anti-gravity 2.0 is expanding beyond just being a coding environment. It's turning into a full platform to develop and manage autonomous AI agents. There's a new standalone desktop application that acts as a central home for agent interaction where you can orchestrate agents for all kinds of tasks. And they've developed an even more optimized version of Flash for anti-gravity that's not just four times faster, but 12 times faster than other Frontier models. The amount of tokens Google is processing internally through their AI developer tools is pretty telling, too. In March, they were processing half a trillion tokens a day. Now, they're doing more than three trillion tokens a day, and they've been doubling every few weeks. That internal usage is creating this powerful feedback loop that's helping them improve the models. Google AI Studio is also getting some major upgrades. It now includes native Cotlin support for coding Android apps, Google Workspace integrations, one-click deploy to Cloud Run, and support for Firebase services. You can build and launch full stack apps directly within AI Studio. And if you want to keep building, you can seamlessly export your complete project to anti-gravity. They're also introducing managed agents in the Gemini API, which removes the friction of infrastructure setup. A single API call gives you a fully provisioned agent with a remote sandbox. And if you want even more control, the new anti-gravity SDK lets you customize the agent and deploy it on your own infrastructure. For Android developers specifically, there's a lot of new stuff. The stable Android CLI lets AI agents tap directly into Android Studio to handle tasks like downloading the Android SDK and running apps on Android devices. They open sourced Android skills to help language models execute best practices for complex workflows like migrating to Jetack Compose. There's also Android Bench, an LLM leaderboard for Android development tasks that now includes openw weight models like Gemma 4. They even previewed a migration agent in Android Studio that can migrate your app code to a native Cotlin Android app regardless of whether your source is ReactNative, a web framework, or even iOS. The agent analyzes your code and does the heavy lifting, turning migrations that would have taken weeks into just hours. On the web development side, Google's proposing WebMCP, an open web standard that allows developers to expose structured tools like JavaScript functions and HTML forms so browser-based AI agents can execute complex tasks with greater speed, reliability, and precision. The experimental web MCP origin trial starts in Chrome 149 with support for Gemini in Chrome coming soon. They're also launching modern web guidance which helps you build more performant, accessible, and secure web experiences by providing your coding agents with expert vetted skills. It supports over a 100 use cases and integrates directly with baseline. You can install it with a single click in anti-gravity or via CLI. Chrome DevTools for agents is another big one. It brings Chrome DevTools capabilities to AI agents, helping you scale your workflow by verifying, debugging, and optimizing code in real time. Your agent can automate quality audits, emulate realworld user experiences, and hand over sessions with autoconnect, all without manual oversight. There's also this new HTML in canvas API that's available in origin trial. It lets developers build immersive 3D experiences that remain fully searchable, accessible, and interactable by integrating real DOM elements directly into a canvas with WebGL and WebGPU. But the consumerf facing stuff is probably what most people will care about. Gemini Spark is their new personal AI agent that runs 24/7 on dedicated virtual machines in Google Cloud. It's powered by Gemini 3.5 and the anti-gravity harness, which allows it to perform long horizon tasks in the background. It'll integrate with Google's own tools first and then with over 30 third party tools through MCP, including Adobe, Dropbox, and Uber. You can work with Spark through the Gemini app, email, or chat. On Android, there's a new UI space called Android Halo coming later this year where you can view live updates and task progress. Later this summer, Spark will operate directly within Chrome, acting as your agentic browser across the web. Gemini Spark is rolling out to trusted testers this week, and the beta is coming to Google AI Ultra subscribers in the US next week. It can do things like pull together relevant emails and docs to craft an update for your boss, manage your calendar, handle follow-ups, all that kind of stuff. They're also introducing information agents in search, which are personalized AI agents you can set up to work in the background 24/7 to find what you need at the right moment and help you take action. These are rolling out this summer starting with Google AI Pro and Ultra subscribers. Search is also getting a genic coding capabilities powered by Gemini 3.5 Flash and anti-gravity. Search will build custom experiences for your individual questions with dynamic layouts and interactive visuals. These generative UI capabilities will be available for everyone in search this summer for free. For longer running tasks, search can build persistent custom dashboards or trackers that you can return to and make progress on. Kind of like mini apps for your specific tasks. There's a new feature called Ask YouTube that entirely reimagines the experience. You can ask complex questions and it'll show you videos that best match your interest. But more importantly, it jumps right to the part of the video most relevant to you. This is starting testing now and will roll out broadly in the US this summer. Docs Live is another cool one. Instead of typing out a precise prompt, you can just verbally brain dump whatever's on your mind and let Gemini do the rest. You'll be able to create new docs and edit them directly, all with your voice. Docs Live is rolling out for subscribers this summer and powerful voice capabilities will come to Gmail and Keep then too. Ask Maps lets you have more natural conversations with maps for complex questions. Daily Brief gives you a personalized digest that synthesizes information from your inbox, calendar, and tasks to find the most important things you need to be aware of prioritizing and suggesting next steps. Google Flow is getting a new agent that can plan and reason through complex tasks with your inputs. You can also vibe code any creative tool right and flow like tools for designing video effects, handdrawn animations, or layering text. Google Pix is their new AI image creation and editing tool built on the latest nano banana model. It treats every element as an individual object rather than a flat static image. So you can create, swap, or perfect specific details to bring your exact vision to life. Pix is available to trusted testers now and will roll out later this summer to Google AI Pro and Ultra subscribers in Workspace. And then there's intelligent eyewear, which is pretty futuristic. Audio glasses are launching this fall in partnership with Gentle Monster and Warby Parker. You can ask Gemini about anything you see. Get natural turnbyturn directions. Manage calls and send texts hands-free. Snap photos and videos. Get realtime translations. And tap into your apps just by using your voice. Display glasses that show information right in your field of view are coming later. Google also announced Gemini for science, which brings together AI tools to help accelerate scientific research. It includes new experiments on labs and science skills to connect agentic platforms like anti-gravity to over 30 major life science databases and tools. So yeah, that's Google IO 2026. Google is clearly pushing hard into the agentic era where AI can create, plan, and actually take action across your digital life. Now we just have to see how well all of this works outside the keynote demos. Let me know what you think in the comments. Subscribe for more AI and tech updates. Hit the like button if you enjoyed the video. And thanks for watching. I'll catch you in the next one.

Sur le même sujet : IA