ENFR
8news

Tech • IA • Crypto

TodayBriefingVideosTop 24hArchivesFavoritesTopics

New DeepSeek V4 Shocks The World: China Fires Back Hard

6/10
AIAI RevolutionApril 25, 2026 at 11:08 PM15:30
Audio player
0:00 / 0:00

TL;DR

DeepSeek’s V4 models combine near-frontier performance with radically lower costs, signaling a major shift in AI economics, infrastructure competition, and long-context capabilities.

KEY POINTS

Dual Model Release

DeepSeek V4 Pro and V4 Flash launched as a two-tier system targeting different workloads. V4 Pro uses a 1.6 trillion-parameter mixture-of-experts design with 49 billion active parameters per query, while V4 Flash is smaller at 284 billion total and 13 billion active. Both are text-focused models with 1 million token context windows and up to 384,000 output tokens, positioning them for large-scale reasoning and agent workflows.

Aggressive Pricing Strategy

Pricing is the defining disruption. V4 Flash costs $0.14 per million input tokens and $0.28 output, while V4 Pro costs $1.74 input and $3.48 output. Comparable systems are far more expensive, with GPT‑5.5 reportedly at $5/$30 and premium tiers reaching $30/$180, and Claude Opus 4.7 around $5/$25. This places V4 Pro up to 98% cheaper than top-tier competitors, dramatically lowering the cost of large-scale AI deployment.

Competitive Benchmark Performance

Early benchmarks show strong but not dominant results. V4 Pro ranks third among open models and 14th overall in coding evaluations, while other tests place it near the top of all systems, sometimes trailing leaders by fractions of a percent. It achieves 90.2% on Apex math benchmarks, outperforming some rivals, but still lags models like Gemini 3.1 Pro on reasoning-heavy tests such as GPQA Diamond and Humanity’s Last Exam.

Strength in Coding and Agents

Coding and agent workflows emerge as V4’s strongest domain. Internal testing shows over 90% of developers ranking V4 Pro among top coding tools, with more than half ready to adopt it as default. It integrates with frameworks such as Claude Code, OpenCode, and Code Buddy, and supports complex multi-step agents for research, data analysis, and software generation.

Interleaved Reasoning for Agents

A key technical feature is interleaved thinking, which preserves reasoning state across tool calls. This reduces context loss in multi-step workflows, improving reliability in long chains of actions where earlier models often degraded or reset intermediate reasoning.

Breakthrough in Long-Context Efficiency

DeepSeek introduces a hybrid attention system combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). These methods compress token groups and selectively focus computation, enabling efficient scaling to 1 million tokens without prohibitive cost. Compared to earlier versions, V4 Pro cuts compute usage to 27% and memory to 10%, while Flash reduces them further.

Engineering and Training Advances

Additional innovations include manifold-constrained hyperconnections for more stable signal propagation and the Muon optimizer for efficient large-scale training. These changes reportedly deliver up to 2× inference acceleration, reinforcing the model’s cost-performance advantage.

Hardware Strategy and Global AI Stack

V4 is designed to run on both Nvidia GPUs and China’s domestic chips, particularly Huawei Ascend NPUs. Nvidia supports deployment on Blackwell and Hopper systems, while Huawei reports up to 1.73× inference acceleration on Ascend hardware. This dual compatibility reflects a broader competition over AI infrastructure and supply chains.

Impact of Export Restrictions

U.S. restrictions on advanced chip exports have pushed Chinese developers toward efficiency and domestic alternatives. While training still partly relies on Nvidia hardware, inference is increasingly shifting to local chips. This suggests the emergence of a parallel AI ecosystem, rather than a complete break from Western infrastructure.

Economic Implications for Developers

The pricing shift changes the feasibility of large-scale applications. Tasks like legal analysis, financial research, codebase review, and enterprise automation become significantly cheaper with million-token context. Smaller teams benefit even more from V4 Flash, enabling low-cost development of chat systems, summarization tools, and lightweight agents.

Open-Weight Advantage

Released under an MIT license, the models can be downloaded, modified, and self-hosted. This gives companies control over customization and deployment, contrasting with closed API-only systems and strengthening the open-weight ecosystem.

Limitations and Gaps

V4 remains text-only, leaving competitors ahead in multimodal capabilities involving image, audio, and video. It also trails leading models in some general reasoning benchmarks, with an estimated 3–6 month gap to frontier systems.

Mixed Early Reception

Initial user feedback varies. Some report performance close to top-tier systems at a fraction of the cost, while others find improvements over previous versions less noticeable in everyday use. This highlights the gap between benchmark performance and real-world experience.

Market Significance

Rather than outperforming all rivals, V4 reshapes expectations around cost and accessibility. By combining strong performance, extreme efficiency, and open deployment, it challenges premium pricing models and signals a broader shift toward cheaper, scalable AI infrastructure.

Full transcript

More from AI