
Tech • IA • Crypto
DeepSeek’s V4 model is driving a sharp drop in AI costs and accelerating global competition, potentially pressuring U.S. labs to speed up new releases.
DeepSeek launched its V4 model with API costs cut by up to 90%, pushing prices as low as 0.02 yuan per million tokens for some tiers. Reports indicate input costs for V4 Pro dropped from about $0.145 to $0.036 per million tokens, dramatically undercutting competitors. This shift is triggering a broader pricing war across the AI sector.
Unlike many Western rivals, V4 is open-source, allowing companies to modify and deploy it freely. This flexibility, combined with lower costs, is making it attractive for enterprises seeking control over infrastructure, customization, and compliance with local regulations.
The model runs on both Nvidia GPUs and Huawei Ascend chips, with support from Chinese chipmakers such as MetaX and Cambricon. This signals a strategic move toward a self-sufficient AI stack in China, reducing reliance on U.S. semiconductor ecosystems.
The China Academy of Information and Communications Technology has begun testing V4, indicating alignment with national-level AI development. Future hardware like Huawei Ascend 950 super nodes could further reduce operating costs, reinforcing domestic competitiveness.
While V4 improves reasoning and agent capabilities, it still trails leading closed systems like Claude 4.6 and Gemini 3.1 Pro in some benchmarks. However, analysts note that being “good enough” at a much lower cost can outweigh marginal performance gaps in real-world use.
Lower prices are driving increased adoption. Companies are scaling usage dramatically, with reports of 51,000 daily AI queries at Disney and 1.9 trillion tokens processed by Visa in a single month. Cheaper models are shifting the bottleneck from capability to workflow integration.
Analysts highlight that as AI becomes cheaper, total usage rises rather than falls. This dynamic means cost reductions could expand overall demand, intensifying competition rather than stabilizing it.
DeepSeek introduced a multimodal system using “visual primitives” like points and bounding boxes to anchor reasoning. This approach addresses the “reference gap,” enabling models to track objects consistently across tasks such as counting, navigation, and diagram analysis.
The system uses only about 90 visual memory entries for an 800×800 image, compared with 740–1,100 in competing models. Despite lower memory use, it outperformed rivals in tasks like maze solving, scoring 66.9% versus 50.6% for GPT-5.4 and 48.9% for Claude.
GPT-5.5 exhibited unusual behavior, frequently referencing goblins, gremlins, and trolls in unrelated contexts. Internal prompts reportedly attempted to suppress such outputs, but the issue persisted and drew widespread attention.
OpenAI Codex is evolving into a broader productivity agent, integrating with tools like Slack, Gmail, and Calendar to automate workflows, analyze data, and assist decision-making. This signals a shift toward AI systems that operate across entire digital environments.
References to GPT-5.6 appeared in backend routing logs, suggesting early testing or staged deployment. While not officially released, the timing coincides with rising competitive pressure from lower-cost models.
Analysts describe a growing divide between closed U.S. models and open, cost-efficient Chinese systems. This split reflects differences in pricing, transparency, and infrastructure alignment.
DeepSeek V4 is reshaping the AI landscape by combining low cost, open access, and hardware flexibility, forcing competitors to respond faster. The emerging battle centers less on peak performance and more on affordability, scalability, and ecosystem control.
A Chinese AI lab just dropped a model so cheap, so open, and so aggressively optimized that it may have forced OpenAI to start testing GPT 5.6 before anyone was supposed to notice. And that is only the first part of the story. Because while everyone was watching the usual model race, Deepseek came in with V4, slashed API prices by up to 90%, proved it can run on both Nvidia and Huawei chips, released a new multimodal system that gives AI a cyber finger to point at what it sees, and somehow turned the whole industry into a pricing war overnight. At the same time, OpenAI has been dealing with one of the weirdest bugs we've seen in a Frontier model. GPT 5.5 suddenly becoming obsessed with goblins. gremlins, trolls, and random creature references. Then, right in the middle of all that, developers spotted something strange in Codex backend logs, a route mapping labeled GPT 5.6. So, now the question is pretty obvious. Did this Chinese lab just push Open AI into fast-forward mode? When V4 arrived, it entered a crowded market filled with powerful US and Chinese competitors. Yet, its impact comes from the combination of things around it. It is open source which means users can download it, modify it and build on top of it. It is extremely cheap to run. It has stronger reasoning and agent capabilities than earlier versions. And maybe most importantly, it fits into China's growing domestic AI stack from chips to cloud to models. That hardware angle is massive. Earlier models mostly relied on Nvidia's CUDA ecosystem. V4 has now been validated on both Nvidia and Huawei Ascend processors. Chinese chip companies like MetaX, Cambercon, and more threads have announced support for it. The China Academy of Information and Communications Technology has also started testing the model, which is a strong signal that this is becoming part of a larger national level push. That means China is no longer only trying to build strong models. It is trying to build a full AI ecosystem that can survive without depending on Nvidia's most advanced chips. And if Huawei's Ascend 950 super nodes launch broadly in the second half of this year, V4 Pro could get even cheaper to run. That is where the pressure on OpenAI, Anthropic, and Google becomes very real. The new model is being described as one of the most powerful open-source large language models currently available. The company says it improved reasoning and agentic ability, meaning it should handle more complex multi-step work. At the same time, it admits that it still trails the strongest closed models in some areas, including Claude 4.6 and Gemini 3.1 Pro. For many companies, that is enough to change the entire calculation. IDC's Chang Mang said the global AI market is slowly splitting into two camps. The US model and the Chinese open-source model. That sounds dramatic, yet it fits what is happening. On one side, you have closed systems from OpenAI, Anthropic, and Google. On the other, you have models that are becoming cheaper, more controllable, more transparent, and more aligned with local hardware and regulation. And then came the price cuts. Deepseek slashed API pricing by up to 90%. For V4 Pro, one report said the cost per million input tokens dropped from around 14.5 cents to just 3.6. In China, pricing updates published on April 26th showed V4 flash cashed input costs falling to 0.02 yuan per million tokens. The business focused V4 Pro model saw promotional cashed input pricing drop to 0.025 025 yuan per million tokens. That is ridiculously low. And that changes the game because the bottleneck is no longer creation. It is workflow. Higsfield is sponsoring today's video and they just introduced something called Higsfield Canvas. It's a node-based visual workspace where your entire creative process lives on one infinite board. So instead of generating things one by one and ending up with a folder full of disconnected files, everything stays connected. You can start with a simple idea or mood board. Generate a character, pass it into another node to animate it, then upscale it, relight it, adjust angles, and keep building toward the final output without leaving the canvas. And the key difference here is visibility. You can actually see the whole pipeline, what references were used, which models generated each part, and how everything connects from start to finish. that makes it much easier to tweak specific parts instead of restarting everything from scratch. It also brings multiple models into one workflow from image models like GPT, image 2 and soul to video models like Cedance, Clling, Juan and VO. So this feels less like using separate AI tools and more like working inside a proper creative system. If you want to try Higsfield Canvas, link is in the description. All right, now back to V4. One user, Yang Hua, from a Shanghai gaming company, said he used V4 to manage files and spent only 0.56 yuan. He said that was less than a tenth of what he paid using a previous US model. While the efficiency and capacity felt almost identical for his use case. Now connect that with what is happening inside big companies. Enterprise AI usage is exploding so quickly that people are now using the term token maxing. Disney reportedly had some engineers using Claude around 51,000 times per day, which forced the company to build an AI adoption dashboard to track usage. Meta reportedly had an internal dashboard that turned into a leaderboard where employees competed over who used AI the most before it was shut down. Visa spent 1.9 trillion tokens in March alone. So, when a strong model becomes much cheaper, this does not just save money, it changes behavior. Teams start using more AI, more workflows get automated, more internal tools get connected, more companies start asking whether they really need to pay premium prices for every task. Val Burkavichi from WKA summed it up with a simple point. Frontier Labs may try to hold prices at first, but token usage will keep rising. Jeban's paradox is undefeated. When something becomes cheaper and more useful, people consume more of it. That is the real danger for the American labs. A cheaper model does not need to win every benchmark. It only needs to be good enough for enough daily tasks and then the cost advantage starts doing the rest. But the story gets even more interesting when we move from text to vision. Right before the Mayday holiday, the team released a technical report called thinking with visual primitives. This work came from deepseek ping university and Singha University and it tackles one of the most annoying weaknesses in multimodal AI. models can see an image yet still lose track of what they are talking about. The report calls this the reference gap. Most multimodal models have focused on the perception gap. In simple terms, they try to see more clearly. They use higher resolution input, cropping, zooming, rotating, dynamic image splitting, and multiscale processing. OpenAI has talked about thinking with images. Gemini and Claude have also pushed toward processing more visual detail. This new research takes a different route. It argues that seeing more pixels is not always the real problem. Sometimes the model can see the image yet still cannot keep a stable reference to the same object while reasoning. That sounds small, but it breaks a lot of visual tasks. Ask a model to count people in a dense crowd, and it may lose track of who it already counted. Ask it whether a red capacitor is left or right of an inductor in a circuit diagram, and the answer can become vague or contradictory. ask it to solve a maze and pure language starts falling apart because phrases like the path on the left or the object near the center are too vague. So the researchers basically gave the model a finger, not physically, of course. The system uses points and bounding boxes as reasoning tools. When it talks about an object, it can anchor that object to coordinates. Instead of only saying the bear on the left, it can attach a box around the bear and keep referring to that exact location as it continues thinking. That changes the role of visual markers. In older systems, bounding boxes were often treated as final outputs. The model would think first and then draw a box to show what it found. Here, the box becomes part of the thinking process itself. The model points while it reasons. When the model counts people in a crowd, it can basically point at each person and keep track instead of losing count. When it solves a maze, it can mark the path it already tried, turn back from dead ends, and continue from the right place. When it follows tangled lines, it can stay on the correct line instead of jumping to the wrong one. The crazy part is that it does this while using far less visual memory than rivals. For an 800 by 800 image, it keeps about 90 visual memory entries. Claude uses around 870, Gemini around 1,100, GPT 5.4 around 740, and Quen around 660. So, it is not trying to see everything harder. It is trying to remember only what matters. That means faster answers, lower costs, and better use in real-time systems like robots, autonomous cars, and video analysis. The team trained it on over 40 million visual examples, including counting tasks, mazes, and tangled line puzzles, and the results were strong. It beat GPT 5.4 and Claude on several counting and maze tests, including maze navigation, where it scored 66.9% while GPT 5.4 scored 50.6% and Claude scored 48.9%. It still has limits, especially with tiny details like medical scans or factory defects. But the main idea is powerful. The future of AI vision may not be about seeing more pixels. It may be about knowing exactly where to look. While all of this was happening, OpenAI had a very different kind of week. GPT 5.5 is powerful, but users started noticing a bizarre pattern. The model kept randomly mentioning goblins, gremlins, trolls, and other creatures in conversations where they had no business appearing. Someone asked about camera gear, and it started talking about dirty neon flash goblin mode. Someone discussed code performance, and the model warned about a performance goblin. Arena AI reportedly found a statistically meaningful increase in GPT 5.5 using words like goblin, gremlin, and troll, especially when high thinking mode was not used. OpenAI's response somehow made it funnier. The codec system prompt reportedly banned goblins, gremlins, raccoons, trolls, ogres, pigeons, and other creatures unless they were clearly relevant. The ban was repeated multiple times. And once users found it, the internet did what it always does. People started trying to make the model say the forbidden word. And yes, it still said it. At the same time, Codeex itself became much more serious. The app can now summarize changes, analyze data, assist with decisions across Slack, Gmail, and Calendar, organize research, create spreadsheets and presentations, compare options, and track trade-offs. Greg Brockman said he had completely fallen in love with the Codeex app after using the terminal for 20 years. Sam Alman said Codex was having its chat GPT moment, then joked about the Goblin moment. So, OpenAI looks powerful, ambitious, and a bit chaotic all at once. Codeex is clearly moving toward the super agent direction where AI does not just chat, but works across your digital life. And then, right in the middle of that, GPT 5.6 appears in back-end logs. Again, this does not mean GPT 5.6 launched. It looks more like early routing, internal testing, or a Canary deployment, but the timing is hard to ignore. A cheaper Chinese open model starts attacking the market from below. Openai's current model has a weird public quirk. Codeex is expanding fast and suddenly the next model label is already visible behind the curtain. There is also a leadership story inside the Chinese company itself. Founder Leang Wenfang has reportedly stayed mostly out of public view since a televised meeting with Xiinping in February last year. Corporate filings show his stake rose from 1% to 34%. His paidin capital increased from 100,000 yuan to 5.1 million yuan while registered capital rose from 10 million to 15 million yuan. At the same time, senior researcher Chen Derry has become much more visible. He worked on V3, R1, and V4, joined in 2023, studied at PKing University, and has papers cited more than 22,000 times. He represented the company at NVIDIA GTC and at a state-backed industry event where he warned that AI companies should tell the public which jobs may disappear first. After the V4 launch, he posted that the team was sharing results they had poured love into after 484 days. While continuing with long-termism and open source for everyone, talent retention also looked stronger than some expected. The research and engineering team reportedly grew from 212 in early December to 270, a rise of more than 27%. Out of 18 key contributors to R1, most are still there. Only two departures were mentioned. Guaya moved to Bite Dance while Jean Hawei's next destination was not disclosed. Now, one important warning. A viral screenshot where one model fixes a bug another model missed does not prove much by itself. Maybe the new model is better at that exact pattern. Maybe it got lucky. Maybe the prompt fit its style better. LLMs are stochastic, so one attempt is not a benchmark. That matters because we are going to see a lot of people saying V4 solved something GPT 5.4 or Claude 4.6 failed on. Some of those examples will be real, some will be cherrypicked. The better test is whether it works consistently in your own workflow with your stack, your code, your prompts, and your cost limits. And that is why this release is so dangerous. It does not need to win every single task. It only needs to be strong enough, cheap enough, open enough, and easy enough to deploy. For a lot of companies, that may be the formula that matters. So yes, GPT 5.6 showing up now makes sense. Open AI can still be ahead at the top, but the pressure from below is getting stronger fast. The AI war is now about cost, speed, chips, agents, vision, open- source, and who can make intelligence cheap enough to spread everywhere. And V4 may have just made that war impossible to ignore. Also, if you want more content around science, space, and advanced tech, we've launched a separate channel for that. Links in the description. Go check it out. What do you think happens next? Does OpenAI answer with GPT 5.6 soon, or does the open- source side keep closing the gap faster than expected? Let me know in the comments. Subscribe if you want more AI updates like this. Thanks for watching, and I'll catch you in the next one.