ENFR
8news

Tech • IA • Crypto

BriefingVidéos du jourBriefings vidéoTopicsTop 50 du jourRésumés quotidiens

Google’s New SIMULA Builds AI Without Limits

IAAI Revolution22 avril 202611:00
0:00 / 0:00

Résumé

INTRO

L’enjeu majeur actuel de l’intelligence artificielle ne réside plus dans la taille des modèles ni dans la puissance de calcul, mais dans la qualité et la disponibilité des données. Face à la rareté des données spécialisées nécessaires à la prochaine étape de l’IA, Google et OpenAI déploient des solutions innovantes : génération contrôlée de données synthétiques et outils avancés pour mieux comprendre les systèmes d’IA complexes.

Points clés

  • La crise de la donnée en IA spécialisée

    L’industrie de l’IA s’est longtemps appuyée sur les données massives glanées sur internet : textes, images, codes, conversations. Cette approche a permis de développer des modèles généralistes performants comme GPT, Gemini ou Claude. Cependant, la prochaine frontière est l’intelligence spécialisée (cybersécurité, droit, médecine) où les données manquent à l’échelle nécessaire, contraintes par la confidentialité ou leur coût.

  • Simula, générateur de données synthétiques structurées par Google

    Google a présenté Simula, un système novateur pour générer des jeux de données artificiels, mais structurés et contrôlés. Contrairement aux méthodes classiques qui produisent des exemples aléatoires et souvent de qualité variable, Simula crée d’abord une carte taxonomique détaillée du domaine (exemple : types d’attaques, acteurs, vulnérabilités en cybersécurité). Cela évite le problème de répétition excessive d’exemples similaires (mode collapse) en assurant une couverture complète et diversifiée.

  • Technique avancée : métaprompts et contrôle de complexité

    Simula génère des « métaprompts », des instructions pour produire les données, couvrant de multiples variations et garantissant la diversité. Il intègre aussi un paramètre réglant la complexité des exemples, permettant de créer un équilibre entre diversité et difficulté. Cette complexité ajustable améliore la formation : dans certains cas, la performance des modèles a progressé de 10 % sur des benchmarks mathématiques grâce à des données plus sophistiquées.

  • Double critique pour un contrôle qualité renforcé

    La validation finale des données s’appuie sur un système à double critique : une évaluation simultanée de la validité et de l’invalidité de chaque exemple. Cette méthode corrige la tendance des IA à confirmer des réponses plausibles mais erronées, assurant ainsi une meilleure fiabilité des data sets synthétiques.

  • Vers un changement de paradigme centré sur la conception de données

    Google transforme la création de jeux de données en problème d’ingénierie mesurable et scalable. La concurrence pourrait bientôt dépendre moins de la quantité de données collectées que de la qualité et de la capacité à concevoir des données optimales, ouvrant la voie à une IA entraînée majoritairement sur des données générées artificiellement.

  • Euphan d’OpenAI : décryptage des IA agents complexes

    OpenAI lance Euphan, un outil destiné aux développeurs pour analyser les comportements des IA agents capables d’exécuter des tâches multiples et imbriquées. Euphan transforme des logs bruts souvent illisibles (JSON, appels d’outils, raisonnement) en timelines interactives et structurées, facilitant la compréhension, le débogage et le contrôle des systèmes d’IA.

  • Importance croissante de la visibilité dans les workflows IA

    Alors que les IA évoluent vers des agents autonomes multi-étapes, la compréhension exacte de leurs actions devient cruciale. Euphan offre des filtres, une édition directe des données et un accès aux métadonnées, rendant la gestion d’énormes volumes d’interactions plus accessible qu’avec les méthodes traditionnelles.

  • Hermes : la prochaine étape pour ChatGPT selon OpenAI

    En parallèle, OpenAI travaille sur Hermes, une fonctionnalité qui devrait permettre à ChatGPT d’héberger des agents persistants, dotés de rôles, compétences et workflows spécifiques. Ces agents fonctionneraient en continu, répondant à des déclencheurs ou exécutant des tâches sans intervention humaine constante, transformant ChatGPT d’un simple assistant réactif à une plateforme proactive multi-agent.

  • Vers une IA « toujours active » et modulaire

    Ce passage à des agents autonomes actifs en permanence signifie que les utilisateurs pourraient déléguer plusieurs tâches simultanément à différents agents spécialisés, faisant évoluer l’IA vers un système plus collaboratif et réparti, moins dépendant d’interactions humaines ponctuelles.

  • Conséquences sur la collecte et la propriété des données

    L’émergence des données synthétiques performantes pose des questions fondamentales : la collecte massive de données réelles restera-t-elle nécessaire ? Le recours à des données sensibles ou protégées perdra-t-il son intérêt ? La synthèse contrôlée pourrait réduire la dépendance au monde réel, transformant ainsi la manière dont l’IA est entraînée et développée.

  • Un tournant dans l’écosystème IA

    Ces innovations marquent un tournant où la maîtrise technique de la conception des données et la capacité à suivre précisément l’activité des IA agents deviennent les nouveaux leviers de compétitivité. Le futur de l’IA pourrait ainsi reposer autant sur ces outils d’ingénierie fine que sur les modèles eux-mêmes.

En résumé, face à la pénurie de données spécialisées, Google et OpenAI inaugurent un changement de paradigme : concevoir des données sur mesure et comprendre en profondeur les comportements complexes des IA agents. Ce nouveau modèle d’IA, plus contrôlable, modulable et autonome, prépare l’entrée dans une ère où la qualité et la gestion des données priment sur la quantité.

Transcription complète

One of the biggest problems in AI right now is also one of the least talked about. It's not model size and it's not compute. It's the quality and availability of data. And the reason this matters is because AI might be running out of it. For years, the entire industry has been powered by scraping the internet. Text, images, code, conversations, everything gets fed into these models and that's how they learn. That worked extremely well for general purpose AI. That's how we got things like GPT, Gemini, Claude. Though now we're hitting a wall because the next phase of AI is not general knowledge. It's specialized intelligence things like cyber security, legal reasoning, medical decisionm and the data for that either doesn't exist at scale or it's locked behind privacy regulations or just extremely expensive to collect. So the question becomes, what happens when the internet is no longer enough? Google might have just answered that. They introduce something called Simula. And at first glance, it sounds simple. It's a system for generating synthetic data sets. Basically, artificial training data. Though the way it works is very different from what people usually imagine. Most synthetic data today is kind of messy. You take a model, give it a prompt, and ask it to generate examples. Maybe you tweak the prompt, run it again, filter the results, and hope you get something useful. The problem is that approach doesn't really scale, and more importantly, it's not controllable. You might get good examples, though, you might also get repetitive ones, shallow ones, or even incorrect ones. And when you're training serious models, especially in fields like law or security, that's a big issue. So, what Simula does is flip the entire process. Instead of randomly generating data and hoping it works, it designs the data set first. And this is where it gets interesting. The system starts by breaking down a domain into what you can think of as a structured map, a taxonomy. So if you're working on something like cyber security, it doesn't just say generate questions. It actually identifies all the key dimensions, attack types, threat actors, vulnerabilities, mitigation strategies, and then expands each of those into detailed subcategories. So now instead of guessing what data should look like, you have a full map of the entire space. And that matters because one of the biggest problems in AI data sets is something called mode collapse, where models keep generating similar types of examples over and over again, even if the topic is broad. Simula avoids that by sampling directly from this structured map. So when it generates data, it's not random. It's deliberately pulling from different parts of the domain, including the rare and unusual cases that usually get ignored. That's step one. Then it moves into something called metaprompts. Instead of directly generating the final data, it first creates instructions for generating that data. So imagine combining different elements from the taxonomy like a specific threat type and a specific scenario and turning that into a unique prompt. And it doesn't just create one version. It generates multiple variations of those prompts at the same time and then selects a subset to make sure there's real variation, not just slight rewarding. So now you have coverage across the domain and variation within each part of it. Then comes the next layer, which is complexity. This part is actually pretty important. Simula can control how difficult or advanced each data point is. There's a parameter that basically says take a certain percentage of the data set and make it more complex, more nuanced, more confusing, more realistic. And what's interesting is that this is separated from everything else. So you can increase complexity without losing diversity. You're not trading one for the other. Though, here's where it gets even more practical. Because generating complex data is only useful if it's correct. So the final step is quality control. And instead of just asking the model, is this correct? Simula uses something called a dual critic system. It asks two separate questions. One, is this correct? And two, is this incorrect? That sounds simple, though it actually solves a real problem with AI systems. They tend to agree with plausible answers, even when they're wrong. This setup forces the system to evaluate both sides which reduces that bias. So by the end of this pipeline you get data that is structured, diverse, adjustable in complexity and verified for quality. And the results are actually pretty solid. In some cases, models trained on this synthetic data performed better than models trained on traditional data sets. There was even a point where increasing the complexity of the data gave a noticeable boost in performance around 10% improvement in one of the math benchmarks. Though it's not always that simple. In domains where the teacher model wasn't strong enough, adding more complexity actually made things worse. Because if the system generating the data isn't reliable, then harder examples just amplify the errors. So there's a balance here. Though the bigger takeaway is not just performance, it's control. This is one of the first systems that treats data set creation like an engineering problem. Something you can design, measure, and scale instead of just collecting whatever data you can find. And that changes things because for a long time the advantage in AI was having access to more data, bigger data sets, more scraping, more resources. Though if systems like this keep improving, the advantage shifts from who has the most data to who can design the best data. And that's a very different game. It also raises some bigger questions. If AI can generate its own training data and that data can outperform real world data sets in certain cases, then what role does the real world play going forward? Do we still need massive data collection pipelines? Do companies still need to rely on copyrighted or sensitive data? Or do we start moving into a phase where most training data is synthetic by default? We're not fully there yet. Though, this is one of those steps that feels like it's pointing in that direction. And if that happens, then the entire foundation of how AI is built starts to shift because the bottleneck is no longer data availability. It's data design. Though, while Google is trying to solve how AI learns, OpenAI is focusing on understanding what AI is actually doing. Because as these systems get more advanced, especially agent style AI that can take actions, write code, call tools, and run multi-step tasks, things start to get messy very quickly. When something breaks, you don't get a simple error message. You get hundreds, sometimes thousands of lines of raw logs, JSON files, nested outputs, tool calls, reasoning steps, all mixed together. And trying to understand what actually happened inside that process becomes a serious challenge. So, OpenAI just released Euphan, and it's not a new model. It's a tool. Though, it solves a very real problem that a lot of developers are running into right now. Euphan is basically a browserbased system that takes those messy AI logs and turns them into something you can actually read. Instead of staring at raw JSON, you get a clean, structured timeline of what the AI did step by step, who said what, which tool was called, what the model was trying to do at each moment, and how the entire process unfolded from start to finish. And this matters more than it might seem at first, because AI is moving toward agents. Not just chat bots that respond to prompts, those systems that can plan, execute, and adjust their own behavior over time. And this matters more than it might seem at first because AI is moving toward agents. Not just chat bots that respond to prompts, those systems that can plan, execute, and adjust their own behavior over time. And once you get into that world, visibility becomes critical. You need to understand why the model made a decision, where it failed, what step caused the issue. Otherwise, debugging turns into guesswork. Euphan changes that. It lets developers load entire conversations or session logs directly into a browser and instantly explore them in a structured way. You can filter by roles, focus on specific steps, inspect metadata, even edit the data directly if needed. And it works with large data sets, too. So instead of manually digging through files, you can actually navigate the behavior of an AI system almost like you're replaying it, which is a big shift because one of the biggest limitations in AI development right now is not just building the systems, it's understanding them. And tools like this are starting to close that gap. It also says something about where OpenAI is heading. This isn't a flashy consumer feature. It's infrastructure. It's tooling for developers who are building more complex systems on top of AI. And that aligns with a bigger trend. AI is becoming less about single prompts and more about workflows, multi-step processes, connected tools, longunning tasks. And if that's the future, then tools that make those systems visible, debugable, and controllable become essential. What makes that even more interesting is that OpenAI does not seem to be stopping at tools for understanding agent behavior. It also looks like they're getting ready for a version of chat GPT built around agents that keep running in the background. According to what's surfacing inside the product, OpenAI is testing a new feature code named Hermes and it points to something much bigger than a normal update. The idea seems to be turning chat GPT into a place where users can create persistent agents with their own roles, skills, tasks, and workflows. Then let them keep operating beyond a single chat session. So instead of opening chat GPT, asking for something, and getting one reply back, you could have agents working continuously on your behalf. That changes the model quite a bit. Chat GPT today is still mostly reactive. You ask, it responds. Though Hermes points to something closer to an always on system where different agents can handle different jobs, respond to triggers, run on schedules, and stay connected to tools even when you're not actively in the app. And that's where this starts to feel bigger. If Euphan is about helping developers understand complex AI workflows, Hermes looks like the kind of product direction that makes those workflows much more common. Because now the goal is no longer just one assistant answering questions. It's multiple agents acting more like teammates, each with a function, each working in parallel. There's still no confirmed release date, though. If this launches anywhere close to what these signs suggest, then Chat GPT could start shifting from a chatbot people visit when they need help into a platform where AI keeps working in the background, whether you're watching or not. Also, if you want more content around science, space, and advanced tech, we've launched a separate channel for that. links in the description. Go check it out. A lot of this is still early, though the trajectory is starting to get pretty clear. Drop your thoughts below. Thanks for watching, and I'll catch you in the next one.

Sur le même sujet : IA