ENFR
8news

Tech • IA • Crypto

Aujourd'huiMa veilleVidéosTop articles 24hArchivesFavorisMes topics

Mémoire et rêve pour agents auto-apprenants

AnthropicClaude21 mai 2026 à 16:4921:28
Lecteur audio
0:00 / 0:00

INTRO

Anthropic a introduit de nouveaux systèmes de mémoire et de « dreaming » conçus pour aider les agents d’IA à apprendre d’une tâche à l’autre, améliorant leurs performances au fil du temps et à grande échelle.

POINTS CLÉS

Les agents gagnent en capacité et en durée

Les agents d’IA peuvent désormais gérer des tâches de plus en plus complexes sur des périodes prolongées, la recherche indiquant que la durée des tâches qu’ils peuvent traiter double environ tous les sept mois. Ce progrès rapide met en lumière une limite majeure: maintenir un contexte utile sur des tâches longues ou répétées reste difficile. Sans apprentissage persistant, les agents repartent en pratique de zéro à chaque fois.

La mémoire permet un apprentissage continu

Le nouveau système de mémoire permet aux agents de conserver et de réutiliser les connaissances issues de tâches précédentes, améliorant les résultats au fil du temps. Plutôt que des performances isolées, les agents apprennent de leurs erreurs, réutilisent des stratégies efficaces et partagent des enseignements entre environnements, créant un effet cumulatif.

Une conception basée sur des fichiers alignée avec les forces des modèles

La mémoire est structurée comme un système de fichiers, tirant parti de la capacité des modèles à naviguer, lire et modifier des fichiers avec des outils familiers. Cette approche réduit les frictions et permet d’organiser l’information naturellement, tout en laissant aux agents la flexibilité de décider quoi stocker et comment.

Collaboration multi‑agents via une mémoire partagée

Le système prend en charge une mémoire partagée entre plusieurs agents, facilitant la collaboration au sein et entre les environnements. Des niveaux d’accès différenciés — mémoire organisationnelle en lecture seule, espaces lecture‑écriture spécifiques aux tâches — créent une hiérarchie évolutive.

Contrôles et observabilité pour l’entreprise

Des garde‑fous intégrés incluent contrôle de version, journaux d’audit et attribution, permettant de suivre l’évolution de la mémoire et l’origine des modifications. Une API autonome facilite l’intégration, avec des fonctions d’export et de caviardage adaptées aux exigences des entreprises.

Gains mesurés en production

Les premiers utilisateurs signalent des améliorations notables. Rakuten a obtenu une réduction de 97 % des erreurs au premier passage, tandis que Wise Docs a diminué les problèmes récurrents dans la vérification de documents.

Limites de l’optimisation locale de la mémoire

À mesure que les systèmes se développent, des problèmes apparaissent: duplication des connaissances, informations fragmentées, et apprentissages redondants entre agents. Les mises à jour sont souvent optimales localement mais manquent de coordination globale.

Le « dreaming » introduit une optimisation globale

Le processus de dreaming agit comme une boucle de rétroaction analysant l’activité entre agents et sessions. Il identifie des schémas, erreurs récurrentes et inefficacités, puis réorganise et améliore la mémoire, sans ajouter de latence aux tâches actives.

Gains démontrés du dreaming

Les premiers résultats sont marquants: Harvey rapporte une multiplication par six des taux de complétion sur des benchmarks juridiques. En synthétisant les apprentissages sur plusieurs exécutions, le dreaming permet un progrès à l’échelle du système.

Une architecture découplée qui améliore les performances

Le dreaming fonctionne en dehors de la boucle principale, permettant aux agents de se concentrer sur les tâches tandis que l’optimisation se fait en parallèle. Il peut être déclenché par des événements ou des planifications et traite plusieurs sessions simultanément.

Application concrète en gestion d’incidents

Dans un déploiement, des agents gérant des alertes système ont utilisé une mémoire partagée pour coordonner leurs réponses. Lorsqu’un agent identifiait une correction en cours, les autres adaptaient leurs actions. Le dreaming a ensuite détecté des motifs récurrents (ex. alertes après pics CPU) et amélioré la mémoire pour de futures décisions.

Vers des systèmes de connaissance à l’échelle organisationnelle

Ensemble, mémoire et dreaming créent une couche de connaissance en amélioration continue couvrant agents et tâches. La mémoire capture l’expérience, le dreaming l’affine et l’organise, élevant progressivement le niveau de performance global.

CONCLUSION

Les systèmes de mémoire et de dreaming d’Anthropic marquent une évolution vers des agents d’IA capables d’apprendre de manière cumulative et collaborative, ouvrant la voie à une intelligence scalable à l’échelle des organisations.

Transcription complète

Hello. Thank you for joining us today. I'm excited to kick things off on uh the breakout stage. My name is Ravi and I lead the API knowledge team within platform at Enthropic. And since joining Anthropic last year, my focus has been creating the building blocks for agents to interact with many forms of knowledge, ranging from the context window itself to skills, files, and even content on the web. And we recently released two features that I'm most excited about. Memory and dreaming. We now have the building blocks for agents to learn over time and improve from one task to the next. And I'll talk about why we think memory is important, how we designed it, and we'll close out with dreaming, our new frontier memory feature. There we go. But first, a quick timeline of milestones that got us here. And the important thing is models have been improving and agents are capable of completing tasks that take many many hours and are increasingly complex. So in 2024 we released model context protocol MCP and this gave models access to external tools and data in a principled way. In 2025, we released Cloud Code and the agent SDK, which lowered the barrier to using and building agents, which as an aside, that blows my mind that that was in 2025. It honestly feels like a lifetime ago. Later that year, we launched skills, which gave model models a generic abstraction for unlocking and effectively bolting on new capabilities to complete specific tasks. Last month we released cloud managed agents a platform for reliably running agents that takes care of the hard parts. Now the important through line here is that agents can do more and they can operate over longer and longer time horizons. So in 2025, Meter released a study saying the length of tasks that agents can complete is doubling every seven months. And we're seeing this happen. But managing context over long horizon tasks is still a work in progress. And that's where memory comes in. Memory lets agents learn. It lets agents carry forward learnings from their previous tasks. And in the simplest sense, imagine a set of tasks. Task one, task two, task three, and so on. The goal is for performance to improve from one task to the next. In the base case without something like memory, performance on each task might be similar because every agent is just starting from the same slate. In the optimal case, performance improves from task one to two, task two to three, and so on. That's the goal. Learning from task to task, but also from environment to environment and agent to agent. So with memory agents can learn from common strategies and previous mistakes. They can learn from the tools they have access to or code bases and files. And finally they can transfer these learnings to and from other agents. Think swarms of agents contributing to and maintaining a shared understanding of the organization they work in. This is the dream. So we recently launched memory for cloud managed agents and this is a major step towards this vision. It gives developers a frontier memory system that is built to maximize intelligence out of the box and it supports multi- aent systems all with enterprise control and observability. And we built memory in partnerships with several teams that are using managed agents. And the results speak for themselves. Racketin saw a 97% decrease in first pass errors in agents deployed in production. Wise docs reduced common issues using cross session memory in their document document verification pipeline. And the through line here and the common feedback we get is that our memory primitive allows teams to focus on building the product not the infra and all while reaping the benefits of increased intelligence that comes along with better memory. Now you might be thinking is memory really new? Rightfully so. Memory is a concept that's not entirely new, but our approach for it with agents has greatly evolved and previously we built memory focusing on capabilities in the harness. So you might be familiar with claw.md for cloud code or dedicated memory tools in the SDKs. But one pattern we're seeing is that as models improve, we really just want to get out of Claude's way, similar to what we did with skills. And skills was a very basic format that was highly flexible. And it created endless possibilities. And the model understood how to operate with it. And so with memory, we've leaned into that same direction with files. So let's talk about some of the capabilities that we design memory around. So right now with the current set of models, we know a few things. Models and claude are great at navigating virtual environments and a file system. And Claude is also very capable at using familiar tools like bash and GP to read, update, and organize files. Opus 4.7 that we launched last month is a state-of-the-art model at file system based memory and it's increasingly capable of discerning which context is most important to save for its future self and how it should be structured and how it should be represented. And so with memory we've modeled it as a file system to quad. Again, the key principle is getting out of cloud's way and letting it use the capabilities it already has that are very strong. Or as we like to say, let it cook. This is the dream. But we've talked about Claude's memory capabilities within the context of a single agent, but we want it to work across multiple agents that are operating in the same environment at the same time or maybe across environments. And this introduces new requirements like for example letting multiple sessions share the same memory store at the same time. And maybe they want different scopes. So we offer readonly scopes and read write scopes. So for example, you could have organizationwide memory that's readon and it's updated fairly infrequently and it can be accessed by all agents and the same set of agents can have access to more granular memory stores that they can read and write freely and so this creates a hierarchy and uh allows the memory system to really scale. Now, to combat right conflicts, to make sure that one agent isn't clobbering another's rights, we employed a optimistic concurrency control model to avoid agents overwriting each other's changes. And last but not least, memory needs to work for real production agents. This means enterprisegrade controls. So version control uh creates an audit trail as agents make changes and developers can see how memory evolves over times. They can even diff between versions and there's attribution to see which agent wrote which part of the memory. And I think one of the most important pieces is that memory has a standalone API. It enables developers to manage their memory from anywhere. And the reality is teams are building their systems in many different environments. So they can use memory via these APIs which provide standard credit operations but also more enterprise focused operations like exports and redactions. Okay. So we've covered three key components of a memory architecture. One, we started with the storage layer, which is how the data is managed itself and how changes are tracked. Next, the structure of memory, optimizing in a format that allows Claude to get the most out of it. And finally, cloud-driven processing for updating the memory. Now, let's stop at that processing point. agents writing memory as they work is very key to the processing layer. Think of it as taking notes while you're doing something. But as we scaled up this pattern to more complex multi- aent works like uh use cases, we started seeing some limits across different sessions and we started seeing some common patterns. For example, agents were prone to making many of the same mistakes and they learned from their mistakes independently. agents also displayed some of the same patterns of inefficiency. And the general theme was memory was being updated in a locally optimal way, but it wasn't globally optimal. In some cases, there was duplication or fragmentation. And so we started thinking really deeply about this problem and in the last couple of months we built a feedback loop in the processing layer that combed some of these problems. Now, I've said it a couple times, but this time I mean it. This really is available in research preview right now, and it can be used with managed agents. It's a process that looks for patterns and mistakes across agents and sessions, and it automatically curates their memory. Customers like Harvey saw a six times increase in completion rates for their legal benchmark with Dreaming and we're actively seeing other usage of Dreaming and we're really excited to see how people are benefiting from it. A quick overview of how it is process from sessions. It's completely decoupled. Think of it like a feedback loop. Agents write memories and dreaming refines and this process repeats. And dreaming can be kicked off ad hoc, nightly, hourly, or it can even be triggered by events like the end of a session. It's all controlled via API. So, it's very flexible. Each dreaming run analyzes session transcripts. It inspects the existing state of memory and it proposes optimizations to the memory in scenarios where sessions were inefficient, made mistakes or needed improved guidance. And the output is a verified better organized snapshot of memories that agents can choose to adopt. And dreaming truly enables continuous selfarning. It closes the loop on memory. I mentioned outofband the outofband component of dreaming is really really critical creating a process that's decoupled from the underlying agent loop has benefits for one architecture makes it useful for multi- aent systems looking at cross session cross agent transcripts discerns patterns that a single agent in isolation might struggle to identify There's also benefits to having a dedicated dreaming harness. It allows for clearer objectives. Since dreaming is an independent process, there's no risks of agents needing to trade off between improving their memory quality or actually just completing their task objective. It's clean separation. And lastly, it doesn't add any latency to the agent. It's completely removed from the hot pass. So zooming out, we now have a robust memory layer that can be shared across agents and environments instead of only within specific tasks or usage. We also have dreaming, a process that globally optimizes and reconciles memory across agents. And the result is a capable memory system for organizational memory that is capable of scaling up both the size as well as the quality of memory. And the way I think about it is sharing memory that's constantly improving across agents raises the floor for every agent and dreaming raises it even further. And if you really explode the size of this capability and you pull it all together, memory becomes a huge source of knowledge. models or test time compute where letting models spend some tokens to explore a problem on average produces better outcomes. With dreaming, agents are doing the same thing. They're spending some work up front to curate and produce higher quality memory and that pays dividends for all downstream agent performance. We believe that dreaming and memory form the basis of a frontier memory system. Memory on the left helps agents learn and remember from task to task and dreaming on the right verifies, organizes and enriches the memory. The way I think about it is dreaming is the bridge between memory as we know it today and organization scale memory and knowledge. Now I'm going to flip over to a demo. So this uses both dreaming and memory in practice. It's an agent platform for SRRES and everyone loves being on call, right? So here we have a system that looks at incoming alerts and pages and for some of them it actually uh spins up agents that decide how to triage and fix the issues as they come up and it has access to a couple of memory stores. One is a readonly orwide knowledge memory store. And so this contains things like the SLO policy or runbooks and on call mappings information that doesn't change very often but is important for every agent. And it also has access to read write memory stores that are specific to the task at hand. Now we can dig into an interesting example here where an agent investigated and found the root cause of an alert and it put up a fix and it noted in memory. You can see the writes. It noted in memory that a fix was in flight and it was incoming. And then the shared memory store can be read by uh subsequent sessions. And so here we can see that when a similar issue arises, the downstream session already knows that a fix is in flight and it's able to act based on that information. And I really think this is just such a cool pattern because you know the I I was once an SR in my career and this really uh helps coordinate across all agents and it's really cross- session memory at work. Now for running in enterprises uh an important piece here is audit logs and history. So with memory you can see the full version history. You can switch between different versions and you can also attribute the rights to specific sessions. And there's also a precondition here and that's the optimistic concurrency model to make sure that agents aren't clobbering each other's rights. Now we'll flip over to the cloud console. One moment. There we go. So, here we see the list of underlying memory stores that we were using in that application. And so, we'll go over to our team SR memory store. And you can see exactly the underlying files that were populated there. And we're going to head over to the dreams tab. And we're going to kick off a dream. And so this can also be done via the API uh but also in the UI. And we're going to select the team SR memory store and we're going to select a batch of sessions from the last seven days. So that's about five. And we're going to start dreaming. As it begins, you can see it making progress. You can look at the dream and see that there are five input sessions. And then you'll see there's actually an output memory store that's being compiled. And you can actually open the dreaming session. This is an important piece. Dreaming itself is built on cloud manage agents. So it's a feature for cloud manage agents built on cloud manage agents itself. You can see that it spins off a series of sub aents to analyze transcripts in parallel. And it has all the same UX as the rest of manage agents. And we'll fast forward to a completed dream session. And you can see the diffs on the memory store updates. And in this example here, we see that across sessions and across agents, there's a a common pattern of an alert triggering 60 seconds after a CPU spike. And this is a recurring pattern. And so it starts to discern that there might be some issue with the retry behavior. And so it makes a note. So this dreaming process makes a note and updates memory so that the next agent that sees this pattern can actually similarly updates the triage log in a more holistic way rather than just being a wrote log of all the events that happened. And that's memory and dreaming at work. So we'll flip back over to the slides and we'll close out. So with that demo, we saw how we can build a a production agent that uses memory and dreaming to self-improve the agents. And this year I think is going to be a really big one. We're going to see agents run for longer and longer time scales, days for example, and continuously building upon and improving their understanding and view of the world around them is very critical to unlocking that capability. And I think memory systems are going to be a big part of what makes this behavior possible. So give it a try. I'm excited to see what everyone builds with it. And I'll be outside if you have more questions. Thank you.

Sur le même sujet : Anthropic