ENFR

Tech • IA • Crypto

Briefing Vidéos du jour Briefings vidéo Topics Top 50 du jour Résumés quotidiens

Anthropic Just Dropped Claude 4.7 And Exposed A Secret Super Model

IAAI Revolution17 avril 202614:11

0:00 / 0:00

Résumé

INTRO

Anthropic vient de lancer Claude Opus 4.7, un modèle d’IA nettement amélioré pour le codage et la gestion d’agents, mais la véritable surprise est qu’ils détiennent déjà Mythos, un modèle encore plus puissant et risqué qu’ils retiennent pour des raisons de sécurité.

Points clés

Lancement de Claude Opus 4.7

Anthropic a officiellement déployé Claude Opus 4.7 sur toutes ses plateformes principales. Il conserve le même modèle tarifaire que la version précédente (Opus 4.6) : 5$ par million de tokens en entrée et 25$ par million en sortie. Ce modèle est présenté comme la nouvelle référence, particulièrement pour les usages professionnels en génie logiciel.
Améliorations notables en codage

Opus 4.7 surpasse significativement Opus 4.6, avec un bond du score sur le benchmark SWE Pro de 53,4% à 64,3%, une hausse importante qui témoigne d’une plus grande rigueur et cohérence dans les tâches complexes et de longue durée. Sur d’autres plateformes d’évaluation, comme S.Bench verified et Cursor Bench, les scores passent respectivement de 80,8% à 87,6% et de 58% à 70%.
Avancée dans l’autonomie et la gestion des sessions longues

Le modèle semble gérer mieux la planification, la vérification de son propre code et les erreurs, ce qui est crucial pour la confiance des développeurs. Cette robustesse favorise les workflows de développement plus longs et complexes sans supervision constante.
Comparaison avec la concurrence

En codage et utilisation d’outils, Claude Opus 4.7 domine GPT 5.4 Pro (57,7% sur SWE Pro) et Gemini 3.1 Pro (54,2%). En revanche, pour les tâches de raisonnement pur sur GPQA Diamond, les scores sont quasiment à égalité (94,2% pour Opus 4.7, contre 94,4% et 94,3% pour GPT et Gemini).
Vision et traitement d’images améliorés

Opus 4.7 supporte désormais des images jusqu’à 2576 pixels sur le côté long (environ 3,75 mégapixels), soit plus de trois fois la résolution des versions précédentes. Cela permet de traiter des images complexes comme des diagrammes techniques, tableaux de bord ou documents contenant de petits détails. Les erreurs de raisonnement sur documents chutent de 21% par rapport à Opus 4.6.
Vers une automatisation du design

Anthropic préparerait des outils générant des sites web, prototypes ou présentations à partir de simples instructions textuelles. Ce virage vers la création numérique automatisée inquiète les acteurs traditionnels du design, comme Figma, Adobe ou Wix, dont la valorisation boursière aurait réagi négativement.
Amélioration de l’interprétation des instructions

Opus 4.7 suit les consignes de manière plus littérale, ce qui est un avantage quand les prompts sont bien formulés mais nécessite un réajustement pour les anciennes commandes, parfois devenues moins compatibles.
Renforcement de la mémoire de travail

Le modèle se démarque aussi par une meilleure gestion de la mémoire basée sur le système de fichiers, aidant à la continuité de projets sur plusieurs sessions, réduisant ainsi la redondance dans la répétition des données importantes.
Nouveautés pour les développeurs
- Introduction des "task budgets" en version beta pour mieux contrôler la consommation de tokens sur les longues sessions API.
- Un nouveau mode Ultra Review sur Claude Code permet des relectures précises des modifications, détectant bugs et erreurs comme un examinateur humain.
- Extension de l’auto mode aux utilisateurs Max, permettant une plus grande autonomie dans la gestion des permissions et l’avancement des tâches complexes.
Performance en travail intellectuel

Au-delà du codage, Opus 4.7 brille comme analyste financier, offrant des analyses plus fines, des présentations plus professionnelles et une meilleure intégration des tâches. Ses résultats sur le benchmark indépendant GDP vala confirment son excellence dans les domaines financiers et juridiques.
L’ombre de Claude Mythos

Anthropic révèle publiquement que Mythos, un modèle plus avancé que Opus 4.7, est actuellement retenu du fait de ses risques liés à la cybersécurité. Mythos serait capable de détecter des vulnérabilités informatiques encore inexploitées, ce qui repose un enjeu crucial d’infrastructure et de sécurité nationale.
Project Glasswing et sécurité

Opus 4.7 sert aussi de banc d’essai pour de nouvelles protections cyber. Ces dernières incluent la détection automatique des requêtes à risque élevé afin d’empêcher les abus. Anthropic limite l’accès à Mythos via un programme réservé aux professionnels de la sécurité (recherche de vulnérabilités, tests d’intrusion, red teaming), évitant ainsi la dissémination d’un outil potentiellement dangereux.
Profil de sécurité et alignement

Par rapport à Opus 4.6, Opus 4.7 maintient un profil de sécurité solide, avec un moindre taux de comportements problématiques tels que la désinformation ou la coopération avec des abus. Il est cependant légèrement plus enclin à fournir des détails excessifs sur des substances contrôlées. Mythos demeure, selon Anthropic, le modèle le mieux aligné évalué, ce qui rend son confinement d’autant plus stratégique.
Positionnement stratégique d’Anthropic

Ce lancement ne se limite pas à une simple mise à jour. Il marque une volonté d’Anthropic de s’imposer dans la course au développement d’IA spécialisées dans le codage, tout en envoyant un message clair à ses concurrents : une nouvelle génération de modèles plus puissants et plus complexes est déjà en préparation, mais nécessite encore du temps pour être déployée en toute sécurité.
Impact et réactions du marché

L’annonce de ces avancées, particulièrement autour des capacités multimodales et de design automatisé, secoue certains secteurs traditionnels de la création numérique, illustrant à quel point l’IA redéfinit non seulement la productivité mais aussi la compétitivité économique et stratégique globale.

En résumé, Claude Opus 4.7 représente un saut qualitatif marqué dans la sophistication et la fiabilité des modèles d’IA pour des tâches complexes et professionnelles, tandis qu’Anthropic pointe ouvertement vers une future génération, Mythos, qui pourrait bouleverser les équilibres en cybersécurité et au-delà — une décision audacieuse qui illustre les tensions dans la gestion des risques liés à l’innovation IA.

Transcription complète

Anthropic just dropped a model that says a lot about what's happening behind the scenes in AI right now. Claude Opus 4.7 is out. It's live basically everywhere at once. It beats Opus 4.6 by a pretty serious margin in coding and agent style workflows. It sees way more detail in images. It got new cyber security safeguards and somehow that still isn't the main story. The bigger story is that Anthropic is openly saying this is not even the most powerful model it has. That title still goes to Mythos, the model it thinks is risky enough to hold back. So yeah, this is one of those launches where the model itself is a big deal, though what it implies about the next model is even bigger. And once you look at the numbers, the roll out, the safety angle, and the way this thing is being positioned against GPT 5.4 4 and Gemini 3.1 Pro, it starts feeling less like a normal update and more like Anthropic tightening its grip on the coding race while warning everyone that an even nastier class of models is already here. So, let's start with what actually launched. Claude Opus 4.7 is now generally available across all Claude products and major cloud platforms with no change in pricing from Opus 4.6. It still costs $5 per million input tokens and $25 per million output tokens and developers can access it through the model ID claopus 4.7. Anthropic is positioning it as a direct upgrade to Opus 4.6 though the actual message is pretty clear. This is the new flagship for people doing serious work especially software engineering. Now Opus 4.6 6 already had a strong reputation with developers, though it also started taking heat for inconsistency in longer and harder coding sessions. Some users felt cloud code could drift, get messy across many tool calls, or lose track during multi-step engineering tasks. Opus 4.7 looks like Anthropic's answer to that criticism. The company says users are now handing off coding work that previously needed close supervision and trusting the model to manage it with much more rigor and consistency. It is supposed to plan better, follow instructions more precisely, and verify its own work before it reports back. The benchmark numbers are a big part of the story here. On SWE Pro, Opus 4.7 scores 64.3% up from 53.4% for Opus 4.6. That is a real jump, not a minor improvement. On S.Bench verified, it moves from 80.8 to 87.6. On Cursor Bench, it goes from 58 to 70. On MCP Atlas, which focuses on scaled tool use, it rises from 75.8 to 77.3. Then there is Racketin SW Ebench, where Anthropic says Opus 4.7 resolves three times more production tasks than Opus 4.6. That last one is especially important because production tasks are where models either start saving time or start creating extra work. Compared with other Frontier models, Anthropic is clearly trying to plant a flag here. GPT 5.4 is listed at 57.7 on S.WEen Pro, while Gemini 3.1 Pro is listed at 54.2. On MCP Atlas, GPT 5.4 is at 68.1 and Gemini 3.1 Pro at 73.9. So, on coding and tool use, Claude looks like it is stretching the gap a bit more. On GPQA Diamond though, the race is basically tied. Opus 4.7 is at 94.2, GPT 5.4 Pro at 94.4, and Gemini 3.1 Pro at 94.3. That tells you where the competition is shifting. Pure reasoning is no longer the main separator. Real world execution is that also matches the way early users are describing the model. The big change does not seem to be that it suddenly feels magical. It feels more solid. It wastes fewer moves, plans better before it starts, handles long sessions more cleanly, and seems less likely to spin in circles when a task gets complicated. When it hits a problem, the recovery path is supposed to be cleaner, too. That matters more than flashy marketing. Because for people using these systems every day, trust comes from whether the model stays sharp over time, not whether it can ace one isolated prompt. And the interesting part is the biggest limitation now is not generating ideas, it is turning them into finished output. That's where today's sponsor, Higsfield, comes in with Marketing Studio, a tool built to shrink that gap. The workflow is pretty straightforward. You paste a product link or upload a product image and the system reads it, understands what you're selling, and turns that into readytouse video ads. And it's not just one version. It generates multiple ad formats in one go. Things like UGC style videos, tutorials, unboxings, product reviews, faster cut ads for attention, and more polished TV style outputs. Each format is built for a different use case. So instead of guessing what might work, you can test multiple angles right away. Hey everyone, I'm Robert and welcome to AI Revolution. Every day, new models, humanoid robots, AI breakthroughs before anyone else. Subscribe. The future won't wait. Under the hood, this runs on Cedence 2, which handles motion, visual consistency, and overall video quality. You can also upload your own face or generate an avatar inside the platform and it keeps that identity consistent across all the videos, which is important if you want something that actually feels like a brand. What this really changes is the time and cost. Instead of going through scripting, filming, editing, and revisions, you go from a product page to a full set of ad creatives in a single session. That's especially useful if you're running e-commerce, testing products, launching apps, or managing campaigns where volume matters. So, if you want to try Higsfield Marketing Studio, the link is in the description. All right, now back to the video. Anthropic also added a new X high effort level, which sits between high and max. That sounds small, though it actually matters a lot in practice. High could be fast, though sometimes too shallow for harder tasks. Max could be stronger though slower and more expensive. XH high is meant to be the middle ground and Claude code now defaults to XH high across all plans. Anthropic is also recommending people start with high or XH high for coding and agentic workflows. So the company clearly sees this as the new sweet spot for serious use. Then there is the vision upgrade and this one is more important than it may sound at first. Opus 4.7 can now process images up to 2576 pixels on the long edge around 3.75 megapixels. Anthropic says that is more than three times as much image detail as earlier claude models. That opens up a much broader set of multimodal tasks. Dense screenshots, technical diagrams, dashboards, UI states, scanned pages, and visuals with tiny labels all become much more usable when the model can actually see enough detail to stop guessing. Anthropic also says document reasoning errors drop by 21% compared with Opus 4.6. That connects to another part of the launch that got people talking. The idea that Claude is moving beyond chat and coding help into design automation and broader digital creation. Reports around Opus 4.7 suggested Anthropic was preparing tools that could generate websites, landing pages, presentations, and prototypes from plain language prompts. Even the talk around that was enough to shake investor confidence in parts of the design software market. Companies like Figma, Adobe, Wix, and GoDaddy were all reported to have slipped after the leak chatter and launch coverage spread. The fear is easy to understand. If users can describe what they want in normal language and get polished visual output back quickly, that starts cutting straight into the middle of traditional design workflows. Another major change is instruction following. Anthropic says Opus 4.7 is substantially better at following instructions. Though it comes with a catch. Older prompts may now behave differently because where earlier models might have interpreted requests loosely or skipped parts, Opus 4.7 takes them more literally. That is great when prompts are well written. It can also create weird results when workflows depended on the model, making soft assumptions. Anthropic is openly recommending users retune prompts and harnesses when moving from Opus 4.6 to 4.7. So, this is one of those upgrades where the model gets better, though your existing setup may need some cleanup, too. Memory is another area getting stronger. Anthropic says Opus 4.7 is better at using file system- based memory and remembering important notes across long multi-session work. That means project files, memory notes, and setups like CloudMD should work more reliably with less need to restate everything from scratch every time a new session starts. For agent style workflows, that kind of continuity matters a lot because weak memory is one of the biggest reasons these systems still break down in real use. Enthropic also launched several developer side updates alongside the model. Task budgets are now entering public beta on the API, giving developers more control over token spending during longer runs. That matters more now because Opus 4.7 uses an updated tokenizer and Anthropic says the same input can map to around 1.0 to 1.3, five times more tokens depending on the content. On top of that, the model tends to think more at higher effort levels, especially later in agentic sessions, which can drive up output tokens. Anthropic says internal testing still shows favorable overall token efficiency on coding evals because the model gets to good answers in fewer turns, though for teams running real API workloads, this is definitely something to monitor. Claude Code is getting a new/ Ultra review command too that starts a dedicated review session that reads through changes and flags, bugs, and design issues the way a careful human reviewer might. Pro and Max users get three free ultra rare views to try it. Auto mode is also being extended to max users, which gives Claude more room to make permission decisions on the user's behalf and push longer tasks forward with fewer interruptions. That fits the whole direction of this launch. Anthropic is not just trying to make Claude smarter. It is trying to make it more autonomous and more useful inside longer working loops. Outside software engineering, Anthropic is also pushing Opus 4.7 as a stronger model for knowledge work. The company says it performs better than Opus 4.6 as a finance analyst in internal testing, producing more rigorous analysis, stronger models, more professional presentations, and tighter integration across tasks. It also says opus 4.7 is state-of-the-art on the finance agent evaluation and on GDP vala a thirdparty benchmark focused on economically valuable knowledge work in areas like finance and legal. So the pitch here goes well beyond coding. Anthropic wants Opus 4.7 to feel like a serious professional model across the board. Now to the part that gives this whole release a different tone. Anthropic is being unusually direct that Opus 4.7 is not its strongest model. That title still belongs to Claude Mythos preview, which the company says is more capable, especially in cyber security, though too risky for broad release right now. That is a pretty wild thing to admit in public. It means Anthropic is launching a flagship while also telling everyone it has something more dangerous sitting in reserve. The company ties this directly to Project Glasswing. Last week, Anthropic said it would keep Mythos preview limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first model to get that treatment. Anthropic says its cyber abilities are below mythos level and that it even experimented with reducing those capabilities during training. At the same time, Opus 4.7 now includes safeguards that automatically detect and block requests pointing to prohibited or high-risk cyber security use. So, Opus 4.7 is not just a model upgrade. It is also a live deployment test for safety systems Anthropic thinks it will need before it can release Mythosclass models more broadly. And the reason for that caution is pretty serious. The concern around mythos is that it could be strong enough at finding vulnerabilities that long buried software weaknesses start becoming usable again at scale. Once a model gets close to that line, the conversation shifts. It is no longer only about productivity. It becomes an infrastructure and security issue. At the same time, Anthropic is not shutting out legitimate cyber work. The company is launching a cyber verification program. So security professionals doing vulnerability research, penetration testing, and red teaming can apply for access in a more controlled way. That tells you Anthropic is trying to walk a very narrow line here. Powerful enough to be useful for defenders, though not openly available in a form that could become a large-scale abuse engine. On safety and alignment, Anthropic says Opus 4.7 has a similar overall profile to Opus 4.6 six with low rates of concerning behavior like deception, sick offency, and cooperation with misuse. It says the new model improves on honesty and resistance to malicious prompt injection, though it is modestly weaker on a few measures, including a tendency to give overly detailed harm reduction advice around controlled substances. Their overall assessment is that Opus 4.7 is largely well-aligned and trustworthy, though not fully ideal. Interestingly, Anthropic also says Mythos Preview remains the best aligned model it has trained according to its own evaluations, which makes the whole withheld model story even more interesting. All right, that's it for this one. If you found this interesting, drop a like and subscribe. Thanks for watching and I'll catch you in the next one.

Sur le même sujet : IA