ENFR

Tech • IA • Crypto

Briefing Vidéos du jour Briefings vidéo Topics Top 50 du jour Résumés quotidiens

Google New Gemini Skillz Turn Chrome Into an AI Beast

IAAI Revolution15 avril 202613:10

0:00 / 0:00

Résumé

INTRO

Google a dévoilé une série d’innovations majeures en intelligence artificielle, intégrant l’IA à Chrome, améliorant la robotique DeepMind, développant des outils d’évaluation des compétences humaines et enrichissant son écosystème Gemini Enterprise pour des workflows automatisés.

Points clés

Google Skills dans Chrome avec Gemini

Lancé le 14 avril 2026, cette fonction permet de transformer des invites répétitives en workflows réutilisables et multi-onglets dans le navigateur. Disponible sur Mac, Windows et Chrome OS (langue EnglishUS obligatoire), elle offre un système de gestion de prompts intégré via une interface utilisateur simple. Les utilisateurs peuvent déclencher une tâche en un clic, même simultanément sur plusieurs pages web, comme comparer plusieurs produits. Google y ajoute une bibliothèque pré-construite comprenant des compétences comme l’analyse d’ingrédients, la sélection de cadeaux et la synthèse de documents, exposant ainsi le concept de gestion de prompts à tous, au lieu de le limiter aux développeurs.
Sécurité et contrôle utilisateur

Google a mis en place des confirmations manuelles pour les actions sensibles (ex. envoi d’e-mails, création de rendez-vous) afin de garantir que les workflows automatisés n’exécutent pas d’actions irréversibles sans consentement. Le tout s’appuie sur le modèle de sécurité robuste déjà intégré dans Chrome, avec red teaming automatique et mises à jour continues.
Vers des agents navigateurs intelligents

Skills pose la première pierre des agents persistants dans un navigateur, avec une exécution multi-onglets et des workflows réutilisables. Google prépare manifestement un agent logiciel plus poussé, intégré à un futur environnement desktop Gemini, évoquant un système complet d’automatisation au-delà du simple chat IA.
Gemini Enterprise : agent tab

Google teste une interface « agent tab » pour Gemini Enterprise, qui ressemble à un espace de travail multiphases. On y trouve un chat dédié, mais aussi un panneau latéral listant objectifs, agents impliqués, applications connectées, documents, et une option cruciale « require human review » permettant un contrôle humain avant l’exécution finale. Ce dispositif est comparable à Claude Co-work, mais avec un focus plus prononcé sur l’exécution sécurisée de workflows complexes.
Notebook LM avec Canvas et connecteurs

Google enrichit Notebook LM d’une couche visuelle appelée Canvas, pour transformer les données en timelines, pages interactives et mini-applications. Par ailleurs, les nouveaux connecteurs permettront d’importer automatiquement des données depuis d’autres services, notamment l’écosystème Google, faisant de Notebook LM une plateforme centrale de recherche et gestion de données. L’automatisation de l’étiquetage par Gemini améliore aussi l’organisation des sources, facilitant la navigation dans de larges bases documentaires.
DeepMind Gemini Robotics ER 1.6

Mise à jour majeure du modèle de raisonnement incarné pour la robotique réelle, cette version améliore le raisonnement spatial (pointage, comptage, relations entre objets), crucial pour éviter les erreurs d’interprétation. Elle renforce aussi la détection de succès en combinant différents flux visuels adaptés à des environnements dynamiques et obscurcis. La nouveauté la plus impressionnante est la capacité à lire des instruments analogiques et numériques (manomètres, jauges) via « agentic vision », technologie qui analyse en détail pour décoder aiguilles et unités avec une fiabilité record.
Collaboration avec Boston Dynamics

Gemini Robotics ER 1.6 a été développé conjointement avec Boston Dynamics et son robot Spot, capable d’explorer un site, capturer des images et les analyser pour interpréter des lectures d’instruments. Ce progrès élève le taux de réussite de la lecture d’instruments à 86%, et jusqu’à 93% avec la vision agentique activée, contre seulement 23% pour la version 1.5.
Google Research Vantage : évaluation des compétences humaines

Ce système est conçu pour mesurer des compétences complexes comme la créativité, la collaboration et la pensée critique, traditionnellement difficiles à évaluer par des tests standardisés. Un « executive LLM » orchestre plusieurs agents IA dans des conversations contrôlées, orientant les discussions pour tester des compétences spécifiques telles que la résolution de conflits ou la gestion de projet.
Tests et résultats

Expérimenté avec 188 participants et 373 conversations, Vantage a obtenu des taux d’évidence très élevés (jusqu’à 92,4% en gestion de projet, 85% en résolution de conflits). Son système de notation égalise la fiabilité des évaluateurs humains, avec des scores de corrélation jusqu’à 0.88 en créativité, ce qui est exceptionnel pour des jugements subjectifs.
Simulation et interprétabilité des résultats

Vantage utilise Gemini pour simuler des utilisateurs avec différents niveaux de compétence, montrant une plus grande précision que les agents indépendants et fournissant des cartes des compétences claires, associant les performances à la structure des conversations. Cela ouvre la voie à des évaluations automatiques et explicables avant applications humaines coûteuses.
Initiative Higsfield : workflow IA doublé d’une division

Le workflow recommandé combine Claude pour la planification et la rédaction, avec Cedance 2 pour la production vidéo, séparant ainsi réflexion créative et exécution technique. Cedance 2 garantit une production audiovisuelle de qualité cinématographique avec une cohérence accrue, illustrant une démarche de spécialisation par modèle IA adaptée à la création de contenu.
Perspectives globales

Toutes ces évolutions (Skills dans Chrome, agents Gemini Enterprise, Notebook LM, avancées robotics et Vantage) suggèrent que Google tend vers un système d’IA intégré, autonome et multifonctionnel, mêlant productivité, automatisation sécurisée et évaluation qualitative des talents humains. La stratégie englobe le cloud, le desktop, la robotique physique et l’analyse sociale, plaçant Google en posture de leader pour la prochaine génération d’outils IA.

Cette synthèse montre que Google investit massivement dans un écosystème cohérent visant à transformer l’IA d’un simple outil à un véritable moteur de workflows intelligents, à la fois dans le logiciel, la robotique et l’évaluation qualitative humaine.

Transcription complète

Google just dropped one of its most interesting AI stretches in a long time. Chrome now has skills that turn prompts into reusable workflows. Gemini Enterprise is testing a new agent tab. Notebook LM is getting Canvas and connectors. DeepMind upgraded Gemini Robotics ER 1.6 for real world robotics. And Google Research unveiled Vantage to score things like teamwork, creativity, and critical thinking with LLMs. Quite a bit just happened, so let's talk about it. All right, so Google just rolled out something called Skills in Chrome built directly into Gemini. The idea is straightforward. Instead of typing the same prompt over and over again every time you open a new page, you can now save that prompt as a reusable workflow and trigger it with one action. This rollout started April 14th, 2026, and it's available on Mac, Windows, and Chrome OS, as long as your Chrome language is set to EnglishUS. So, it's not global yet, but the direction is already clear. Now, if you've used Gemini in Chrome before, you've probably run into this exact problem. You open a page, you ask it to do something like analyze ingredients, compare specs, or summarize content, and then you go to another page and you have to type the same thing again over and over. That friction is exactly what skills removes. Instead of retyping, you save the prompt as a skill. Then later, you just type a slash or hit the plus button, select your skill, and it runs instantly on the current page. And here's where it gets more interesting. It doesn't just run on one page. It can run across multiple tabs at the same time. So now your browser basically becomes a retrieval system. You open five product pages, trigger a skill, and it compares everything in one go. That's something developers have been building manually with LLM pipelines for a while, and now Google just pushed it directly into the browser UI. From a systems perspective, this is basically prompt templating at the browser level. Instead of engineers managing prompt libraries in code, regular users now get a UI version of that idea. You can also edit skills, create new ones anytime. And Google is launching a built-in library of pre-made skills. Things like analyzing product ingredients, picking gifts based on constraints like budget and preferences, or scanning long documents for key info. So now you've got a curated prompt library inside Chrome itself. That's a big shift because tools like lang chain or prompt management systems used to sit behind the scenes. Now that entire concept is being exposed to everyday users. And of course there's the question of safety. Google added confirmation gates for high impact actions. If a skill tries to send an email or create a calendar event, it will ask for approval first. That's a direct response to one of the biggest challenges in agent systems. preventing automated workflows from triggering irreversible actions without user intent. Under the hood, this still runs on Chrome's existing security model with automated red teaming and auto updates. So, it fits into their broader browser infrastructure. Now, if you zoom out a bit, skills aren't just about convenience. This is basically the first real step toward browser level agents. Persistent workflows, multi-tab reasoning, reusable prompts. That's exactly what agent systems need. And honestly, that bigger idea of AI turning into real workflows instead of isolated prompts is exactly why tools like Higsfield are starting to matter more. They are sponsoring today's video. And one setup that makes a lot of sense right now is pairing Claude with Cedance 2. Claude is great at the front end. You can use it to shape the concept, structure the script, refine the prompts, and figure out the creative direction before you generate anything. That already cuts down a lot of wasted time. Then, Cedence 2 takes that direction and turns it into the actual video. You bring the prompt, the structure, and the visual plan into Higsfield. And Cedence 2 handles the production side with stronger motion, better consistency, cinematic quality, and audio generated together with the visuals. That is why this workflow stands out. Instead of forcing one model to do everything, you use clawed for the thinking and Cedence 2 for the execution. Put together, it feels a lot cleaner and a lot more usable if you are actually trying to make content consistently. So go try Cedence 2 yourself. The link is in the description. And that leads directly into what Google is doing on the enterprise side because at the same time they're testing a new agent tab inside Gemini Enterprise. And this is where things shift from AI assistant to something that looks more like a full execution system. Inside this agent tab, you get two main entry points, new task and inbox. When you start a task, it opens a chat interface, but now there's an entire panel on the side with things like goal, agents, connected apps, files, and a toggle called require human review. So, this is no longer just a chat box. This is starting to look like a workspace for running multi-step workflows. The structure is very similar to systems like Claude Co-work. You define a goal, give the model access to tools and files, and let it execute a task across multiple steps. That require human review toggle is especially important. It suggests Google is preparing for agents that can take real actions potentially at a desktop level, not just inside a browser. And that's where things get interesting because this isn't just a feature. It looks like Google is building toward a full desktop agent environment. There's already speculation that this could tie into a future Gemini desktop app. Google is known to be working on an AI Studio desktop app and now you're seeing skills, projects, and agents all evolving at the same time. It feels like these are all pieces of a larger system. At the same time, Google is also pushing Notebook LM in a very different direction. They're testing something called Canvas inside Notebook LM. And this basically adds a visual layer on top of your data. Instead of just reading summaries, you could turn your sources into timelines, interactive pages, even lightweight apps or visualizers. So instead of just analyzing documents, you're now building structured experiences from them. There's also a new connectors feature being tested, which suggests Notebook LM will start pulling data from external services, most likely Google's own ecosystem first, but eventually more. That's a big shift because notebook LM has mostly been limited to manually uploaded sources so far. With connectors, it starts becoming a central research layer across tools. They're also improving source organization with labeling features and even autolabeling using Gemini itself. That solves a real problem for users dealing with large data sets where navigation becomes harder than the analysis itself. Now, while all of this is happening on the software side, Google DeepMind is pushing something equally important on the robotics side. They just released Gemini Robotics ER 1.6, and this is a major upgrade to their embodied reasoning model. To understand this, you need to know how their system is structured. They use two models working together. Gemini Robotics 1.5 is the VLA model, vision, language, action. That one takes inputs and directly controls the robot's movements. Gemini Robotics ER is different. It doesn't control the robot. It acts as the reasoning layer. It understands the environment, plans, tasks, and decides what should happen next. So, if the VA model is the executive, robotics ER is the strategist. And version 1.6 brings some major upgrades. First, spatial reasoning has improved significantly. That includes things like pointing, counting, and understanding object relationships. Pointing might sound basic, but it's actually foundational. It allows the model to identify exact pixel locations, map relationships between objects, define movement paths, and even enforce constraints like identifying objects small enough to fit into a container. In benchmarks, this made a huge difference. The model correctly identified objects like hammers, scissors, and tools while avoiding hallucinating objects that weren't there. That matters a lot in robotics. If a system hallucinates an object, the robot could literally try to grab something that doesn't exist. Then there's success detection, which is one of the hardest problems in robotics. It's not just about doing a task. It's about knowing when the task is actually finished. Modern robots often rely on multiple camera views, overhead cameras, wristmounted cameras, and they need to combine all of that into a single understanding of the environment. Gemini Robotics ER 1.6 Six improves this multiv- view reasoning, allowing it to better handle occlusions and dynamic environments. So now the robot can decide whether to retry a task or move forward without human input. But the biggest new feature here is instrument reading. This is completely new. The model can now read analog gauges, pressure meters, sight glasses, and digital displays in real world environments. This was developed in collaboration with Boston Dynamics using their Spot Robot. Spot can move around a facility, capture images of instruments, and then Gemini Robotics ER1.6 interprets them. And this is not trivial. Reading a gauge requires understanding needle positions, tick marks, units, perspective distortion, and sometimes multiple needles representing different values. The model uses something called agentic vision to do this. It zooms into images, analyzes details, runs code to estimate proportions, and applies world knowledge to interpret the result. The performance jump is massive. Gemini Robotics ER 1.5 had a 23% success rate. Gemini 3.0 Flash reached 67%. Gemini Robotics ER 1.6 reaches 86%. And with Aentic Vision enabled, it hits 93%. That's not just an improvement, that's a completely different level of reliability. Now, at the same time, Google research is working on something that looks totally different, yet it's still part of the same bigger picture. They introduced a system called Vantage, which is designed to measure human skills like collaboration, creativity, and critical thinking. And this is something that traditional tests have always struggled with. Standardized tests can measure knowledge. They can't measure how someone handles a disagreement, generates ideas under pressure, or evaluates arguments. Vantage tries to solve that using LLMs. The core idea is something called an executive LLM. Instead of running multiple independent AI agents, they use one model to control all AI participants in a conversation. That model has access to a scoring rubric and it actively steers the conversation to test specific skills. So if the system wants to evaluate conflict resolution, it might introduce disagreement through one of the AI personas and maintain that conflict until the human responds. This is very different from previous approaches. In experiments, they tested this with 188 participants, generating 373 conversations. Each participant worked through tasks like designing experiments or debating topics with AI teammates. They measured two main skills. conflict resolution and project management. The results showed that the executive LLM produced much higher evidence rates compared to independent agents. For project management, conversation level information rates reached 92.4%. For conflict resolution, it reached 85%. And when it comes to scoring accuracy, the AI matched human raiders at a level comparable to humanto human agreement with Cohen's Kappa values between 0.45 45 and 0.64. They also tested creativity scoring on real student work. In a data set of 180 submissions, the AI's scores had a Pearson correlation of 0.88 with human experts. That's extremely high for subjective tasks. Another interesting part is simulation. They used Gemini to simulate participants at different skill levels, then measured how accurately the system could recover those levels. The executive LLM showed significantly lower error compared to independent agents and the simulated patterns matched real human data. That means you can use LLMs to test and refine these systems before running expensive human studies. And finally, Vantage presents results as a skills map, showing competency levels and linking them to specific parts of the conversation. So, it's not just scoring, it's interpretable. Also, if you want more content around science, space, and advanced tech, we've launched a separate channel for that. Links in the description. Go check it out. Anyway, drop your thoughts below. Curious what stands out to you the most here. Thanks for watching and I'll catch you in the next one.

Sur le même sujet : IA