ENFR

Tech • IA • Crypto

Aujourd'hui Vidéos Récaps vidéo Articles Top articles Archives

DeepSeek Just Started a Global AI War And Exposed GPT-5.6

IAAI Revolution1 mai 202615:38

0:00 / 0:00

INTRO

Le modèle V4 de DeepSeek entraîne une forte baisse des coûts de l’IA et accélère la concurrence mondiale, ce qui pourrait pousser les laboratoires américains à accélérer leurs nouvelles sorties.

Points clés

DeepSeek V4 bouleverse les prix

DeepSeek a lancé son modèle V4 avec des coûts API réduits jusqu’à 90 %, atteignant 0,02 yuan par million de tokens pour certains niveaux. Les rapports indiquent que les coûts d’entrée de V4 Pro sont passés d’environ 0,145 $ à 0,036 $ par million de tokens, sous-cotant fortement les concurrents. Cette évolution déclenche une guerre des prix dans tout le secteur.

La stratégie open source gagne du terrain

Contrairement à de nombreux rivaux occidentaux, V4 est open source, permettant aux entreprises de le modifier et de le déployer librement. Cette flexibilité, combinée à des coûts réduits, le rend attractif pour les entreprises recherchant contrôle de l’infrastructure, personnalisation et conformité réglementaire locale.

L’indépendance matérielle s’élargit

Le modèle fonctionne sur GPU Nvidia et puces Huawei Ascend, avec le soutien de fabricants chinois comme MetaX et Cambricon. Cela marque une orientation stratégique vers une pile IA autonome en Chine, réduisant la dépendance aux semi-conducteurs américains.

Impulsion de l’écosystème national

La China Academy of Information and Communications Technology a commencé à tester V4, signalant un alignement avec les initiatives nationales en IA. Du matériel futur comme les super nœuds Huawei Ascend 950 pourrait encore réduire les coûts d’exploitation et renforcer la compétitivité locale.

Performance face aux modèles de pointe

Bien que V4 améliore le raisonnement et les capacités d’agent, il reste derrière des systèmes fermés leaders comme Claude 4.6 et Gemini 3.1 Pro sur certains benchmarks. Cependant, être « suffisamment performant » à bien moindre coût peut compenser ces écarts en pratique.

L’usage en entreprise remodelé par le coût

La baisse des prix stimule l’adoption. Les entreprises augmentent fortement leur usage, avec 51 000 requêtes IA quotidiennes chez Disney et 1,9 trillion de tokens traités par Visa en un mois. Les modèles moins chers déplacent le goulot d’étranglement vers l’intégration des workflows.

Paradoxe de Jevons en IA

Les analystes soulignent que lorsque l’IA devient moins chère, son usage total augmente. Cette dynamique élargit la demande globale et intensifie la concurrence plutôt que de la stabiliser.

Percée en raisonnement visuel

DeepSeek a introduit un système multimodal utilisant des « primitives visuelles » (points, boîtes englobantes) pour ancrer le raisonnement. Cela comble le « fossé de référence » et permet un suivi cohérent des objets dans des tâches comme le comptage ou l’analyse de diagrammes.

Gains d’efficacité des modèles visuels

Le système utilise environ 90 entrées de mémoire visuelle pour une image 800×800, contre 740 à 1 100 chez les concurrents. Malgré cela, il dépasse ses rivaux sur certaines tâches, comme les labyrinthes (66,9 % contre 50,6 % pour GPT-5.4 et 48,9 % pour Claude).

OpenAI confronté à des turbulences internes

GPT-5.5 a montré un comportement inhabituel, mentionnant fréquemment gobelins, gremlins et trolls hors contexte. Des tentatives internes pour corriger cela auraient échoué, attirant l’attention.

Expansion de Codex et orientation agent

OpenAI Codex évolue vers un agent de productivité plus large, intégré à Slack, Gmail et Calendar pour automatiser les flux de travail, analyser des données et aider à la décision. Cela reflète une transition vers des systèmes opérant dans des environnements numériques complets.

Indices du développement de GPT-5.6

Des références à GPT-5.6 sont apparues dans des logs internes, suggérant des tests précoces ou un déploiement progressif. Cela coïncide avec la pression concurrentielle croissante des modèles à bas coût.

Le marché se scinde en deux camps

Les analystes décrivent une division croissante entre modèles fermés américains et systèmes chinois ouverts et économiques, reflétant des différences de prix, transparence et infrastructure.

CONCLUSION

DeepSeek V4 redéfinit le paysage de l’IA en combinant faible coût, ouverture et flexibilité matérielle, forçant les concurrents à réagir plus vite. La compétition se joue désormais autant sur le prix, l’échelle et l’écosystème que sur la performance brute.

Transcription complète

A Chinese AI lab just dropped a model so cheap, so open, and so aggressively optimized that it may have forced OpenAI to start testing GPT 5.6 before anyone was supposed to notice. And that is only the first part of the story. Because while everyone was watching the usual model race, Deepseek came in with V4, slashed API prices by up to 90%, proved it can run on both Nvidia and Huawei chips, released a new multimodal system that gives AI a cyber finger to point at what it sees, and somehow turned the whole industry into a pricing war overnight. At the same time, OpenAI has been dealing with one of the weirdest bugs we've seen in a Frontier model. GPT 5.5 suddenly becoming obsessed with goblins. gremlins, trolls, and random creature references. Then, right in the middle of all that, developers spotted something strange in Codex backend logs, a route mapping labeled GPT 5.6. So, now the question is pretty obvious. Did this Chinese lab just push Open AI into fast-forward mode? When V4 arrived, it entered a crowded market filled with powerful US and Chinese competitors. Yet, its impact comes from the combination of things around it. It is open source which means users can download it, modify it and build on top of it. It is extremely cheap to run. It has stronger reasoning and agent capabilities than earlier versions. And maybe most importantly, it fits into China's growing domestic AI stack from chips to cloud to models. That hardware angle is massive. Earlier models mostly relied on Nvidia's CUDA ecosystem. V4 has now been validated on both Nvidia and Huawei Ascend processors. Chinese chip companies like MetaX, Cambercon, and more threads have announced support for it. The China Academy of Information and Communications Technology has also started testing the model, which is a strong signal that this is becoming part of a larger national level push. That means China is no longer only trying to build strong models. It is trying to build a full AI ecosystem that can survive without depending on Nvidia's most advanced chips. And if Huawei's Ascend 950 super nodes launch broadly in the second half of this year, V4 Pro could get even cheaper to run. That is where the pressure on OpenAI, Anthropic, and Google becomes very real. The new model is being described as one of the most powerful open-source large language models currently available. The company says it improved reasoning and agentic ability, meaning it should handle more complex multi-step work. At the same time, it admits that it still trails the strongest closed models in some areas, including Claude 4.6 and Gemini 3.1 Pro. For many companies, that is enough to change the entire calculation. IDC's Chang Mang said the global AI market is slowly splitting into two camps. The US model and the Chinese open-source model. That sounds dramatic, yet it fits what is happening. On one side, you have closed systems from OpenAI, Anthropic, and Google. On the other, you have models that are becoming cheaper, more controllable, more transparent, and more aligned with local hardware and regulation. And then came the price cuts. Deepseek slashed API pricing by up to 90%. For V4 Pro, one report said the cost per million input tokens dropped from around 14.5 cents to just 3.6. In China, pricing updates published on April 26th showed V4 flash cashed input costs falling to 0.02 yuan per million tokens. The business focused V4 Pro model saw promotional cashed input pricing drop to 0.025 025 yuan per million tokens. That is ridiculously low. And that changes the game because the bottleneck is no longer creation. It is workflow. Higsfield is sponsoring today's video and they just introduced something called Higsfield Canvas. It's a node-based visual workspace where your entire creative process lives on one infinite board. So instead of generating things one by one and ending up with a folder full of disconnected files, everything stays connected. You can start with a simple idea or mood board. Generate a character, pass it into another node to animate it, then upscale it, relight it, adjust angles, and keep building toward the final output without leaving the canvas. And the key difference here is visibility. You can actually see the whole pipeline, what references were used, which models generated each part, and how everything connects from start to finish. that makes it much easier to tweak specific parts instead of restarting everything from scratch. It also brings multiple models into one workflow from image models like GPT, image 2 and soul to video models like Cedance, Clling, Juan and VO. So this feels less like using separate AI tools and more like working inside a proper creative system. If you want to try Higsfield Canvas, link is in the description. All right, now back to V4. One user, Yang Hua, from a Shanghai gaming company, said he used V4 to manage files and spent only 0.56 yuan. He said that was less than a tenth of what he paid using a previous US model. While the efficiency and capacity felt almost identical for his use case. Now connect that with what is happening inside big companies. Enterprise AI usage is exploding so quickly that people are now using the term token maxing. Disney reportedly had some engineers using Claude around 51,000 times per day, which forced the company to build an AI adoption dashboard to track usage. Meta reportedly had an internal dashboard that turned into a leaderboard where employees competed over who used AI the most before it was shut down. Visa spent 1.9 trillion tokens in March alone. So, when a strong model becomes much cheaper, this does not just save money, it changes behavior. Teams start using more AI, more workflows get automated, more internal tools get connected, more companies start asking whether they really need to pay premium prices for every task. Val Burkavichi from WKA summed it up with a simple point. Frontier Labs may try to hold prices at first, but token usage will keep rising. Jeban's paradox is undefeated. When something becomes cheaper and more useful, people consume more of it. That is the real danger for the American labs. A cheaper model does not need to win every benchmark. It only needs to be good enough for enough daily tasks and then the cost advantage starts doing the rest. But the story gets even more interesting when we move from text to vision. Right before the Mayday holiday, the team released a technical report called thinking with visual primitives. This work came from deepseek ping university and Singha University and it tackles one of the most annoying weaknesses in multimodal AI. models can see an image yet still lose track of what they are talking about. The report calls this the reference gap. Most multimodal models have focused on the perception gap. In simple terms, they try to see more clearly. They use higher resolution input, cropping, zooming, rotating, dynamic image splitting, and multiscale processing. OpenAI has talked about thinking with images. Gemini and Claude have also pushed toward processing more visual detail. This new research takes a different route. It argues that seeing more pixels is not always the real problem. Sometimes the model can see the image yet still cannot keep a stable reference to the same object while reasoning. That sounds small, but it breaks a lot of visual tasks. Ask a model to count people in a dense crowd, and it may lose track of who it already counted. Ask it whether a red capacitor is left or right of an inductor in a circuit diagram, and the answer can become vague or contradictory. ask it to solve a maze and pure language starts falling apart because phrases like the path on the left or the object near the center are too vague. So the researchers basically gave the model a finger, not physically, of course. The system uses points and bounding boxes as reasoning tools. When it talks about an object, it can anchor that object to coordinates. Instead of only saying the bear on the left, it can attach a box around the bear and keep referring to that exact location as it continues thinking. That changes the role of visual markers. In older systems, bounding boxes were often treated as final outputs. The model would think first and then draw a box to show what it found. Here, the box becomes part of the thinking process itself. The model points while it reasons. When the model counts people in a crowd, it can basically point at each person and keep track instead of losing count. When it solves a maze, it can mark the path it already tried, turn back from dead ends, and continue from the right place. When it follows tangled lines, it can stay on the correct line instead of jumping to the wrong one. The crazy part is that it does this while using far less visual memory than rivals. For an 800 by 800 image, it keeps about 90 visual memory entries. Claude uses around 870, Gemini around 1,100, GPT 5.4 around 740, and Quen around 660. So, it is not trying to see everything harder. It is trying to remember only what matters. That means faster answers, lower costs, and better use in real-time systems like robots, autonomous cars, and video analysis. The team trained it on over 40 million visual examples, including counting tasks, mazes, and tangled line puzzles, and the results were strong. It beat GPT 5.4 and Claude on several counting and maze tests, including maze navigation, where it scored 66.9% while GPT 5.4 scored 50.6% and Claude scored 48.9%. It still has limits, especially with tiny details like medical scans or factory defects. But the main idea is powerful. The future of AI vision may not be about seeing more pixels. It may be about knowing exactly where to look. While all of this was happening, OpenAI had a very different kind of week. GPT 5.5 is powerful, but users started noticing a bizarre pattern. The model kept randomly mentioning goblins, gremlins, trolls, and other creatures in conversations where they had no business appearing. Someone asked about camera gear, and it started talking about dirty neon flash goblin mode. Someone discussed code performance, and the model warned about a performance goblin. Arena AI reportedly found a statistically meaningful increase in GPT 5.5 using words like goblin, gremlin, and troll, especially when high thinking mode was not used. OpenAI's response somehow made it funnier. The codec system prompt reportedly banned goblins, gremlins, raccoons, trolls, ogres, pigeons, and other creatures unless they were clearly relevant. The ban was repeated multiple times. And once users found it, the internet did what it always does. People started trying to make the model say the forbidden word. And yes, it still said it. At the same time, Codeex itself became much more serious. The app can now summarize changes, analyze data, assist with decisions across Slack, Gmail, and Calendar, organize research, create spreadsheets and presentations, compare options, and track trade-offs. Greg Brockman said he had completely fallen in love with the Codeex app after using the terminal for 20 years. Sam Alman said Codex was having its chat GPT moment, then joked about the Goblin moment. So, OpenAI looks powerful, ambitious, and a bit chaotic all at once. Codeex is clearly moving toward the super agent direction where AI does not just chat, but works across your digital life. And then, right in the middle of that, GPT 5.6 appears in back-end logs. Again, this does not mean GPT 5.6 launched. It looks more like early routing, internal testing, or a Canary deployment, but the timing is hard to ignore. A cheaper Chinese open model starts attacking the market from below. Openai's current model has a weird public quirk. Codeex is expanding fast and suddenly the next model label is already visible behind the curtain. There is also a leadership story inside the Chinese company itself. Founder Leang Wenfang has reportedly stayed mostly out of public view since a televised meeting with Xiinping in February last year. Corporate filings show his stake rose from 1% to 34%. His paidin capital increased from 100,000 yuan to 5.1 million yuan while registered capital rose from 10 million to 15 million yuan. At the same time, senior researcher Chen Derry has become much more visible. He worked on V3, R1, and V4, joined in 2023, studied at PKing University, and has papers cited more than 22,000 times. He represented the company at NVIDIA GTC and at a state-backed industry event where he warned that AI companies should tell the public which jobs may disappear first. After the V4 launch, he posted that the team was sharing results they had poured love into after 484 days. While continuing with long-termism and open source for everyone, talent retention also looked stronger than some expected. The research and engineering team reportedly grew from 212 in early December to 270, a rise of more than 27%. Out of 18 key contributors to R1, most are still there. Only two departures were mentioned. Guaya moved to Bite Dance while Jean Hawei's next destination was not disclosed. Now, one important warning. A viral screenshot where one model fixes a bug another model missed does not prove much by itself. Maybe the new model is better at that exact pattern. Maybe it got lucky. Maybe the prompt fit its style better. LLMs are stochastic, so one attempt is not a benchmark. That matters because we are going to see a lot of people saying V4 solved something GPT 5.4 or Claude 4.6 failed on. Some of those examples will be real, some will be cherrypicked. The better test is whether it works consistently in your own workflow with your stack, your code, your prompts, and your cost limits. And that is why this release is so dangerous. It does not need to win every single task. It only needs to be strong enough, cheap enough, open enough, and easy enough to deploy. For a lot of companies, that may be the formula that matters. So yes, GPT 5.6 showing up now makes sense. Open AI can still be ahead at the top, but the pressure from below is getting stronger fast. The AI war is now about cost, speed, chips, agents, vision, open- source, and who can make intelligence cheap enough to spread everywhere. And V4 may have just made that war impossible to ignore. Also, if you want more content around science, space, and advanced tech, we've launched a separate channel for that. Links in the description. Go check it out. What do you think happens next? Does OpenAI answer with GPT 5.6 soon, or does the open- source side keep closing the gap faster than expected? Let me know in the comments. Subscribe if you want more AI updates like this. Thanks for watching, and I'll catch you in the next one.

Sur le même sujet : IA