ENFR

Tech • IA • Crypto

Today Topics Videos Crypto Archives Favorites

Elon Musk Just Shocked OpenAI With Grok 5

9/10

AIAI RevolutionMay 28, 2026 at 01:02 AM16:40

Audio player

0:00 / 0:00

TL;DR

A surge of AI developments led by xAI, DeepSeek, and Alibaba signals an intensifying global race in autonomous coding systems and research agents.

KEY POINTS

xAI unveils massive Grok model

Elon Musk’s xAI has completed training of Grok V9, a 1.5 trillion parameter model—three times larger than its predecessor—with a public release expected within weeks. The model represents a major escalation in scale aimed at closing the gap with leading systems in coding and reasoning. Despite the size increase, Grok currently trails competitors in benchmark performance and enterprise adoption.

Training fueled by real developer workflows

xAI reportedly trained Grok using extensive data from Cursor, a widely used AI coding platform adopted by over 67% of Fortune 500 companies. This dataset includes real-world developer prompts, debugging sessions, and multi-file collaboration patterns. The approach targets a key limitation of current models: moving from syntactic code generation to practical software engineering capabilities.

Strategic push into AI programming tools

A $60 billion acquisition option tied to Cursor underscores Musk’s focus on the coding market, with a $10 billion fee even if the deal fails. xAI has also launched Grok Build, a command-line AI programming agent supporting parallel sub-agents, code editing, and execution. Pricing reaches $300 per month, positioning it as a premium enterprise tool.

Competitive gap remains significant

On the SWE-bench Verified benchmark, GPT-5.5 scores 88.7%, Claude Opus 4.6 reaches 80.8%, while Grok models sit around 72–75%. Enterprise usage reflects a similar gap, with OpenAI at 55%, Anthropic at 47%, Google at 39%, and xAI at just 6%, highlighting the challenge Grok V9 must overcome.

AI-generated research reaches new scale

A 46-page paper led by DeepSeek researcher Deli Chen was 99% generated by an AI agent, completed in roughly six days with only two hours of human input. The system processed 648,000 tokens and verified over 100 references, demonstrating rapid acceleration in academic output and raising questions about authorship and research inflation.

Framework for autonomous research agents

The paper proposes a five-level autonomy scale, from basic autocomplete tools to fully self-directed research systems. Current leading systems operate at Level 4, capable of multi-step autonomous work within defined constraints. Key unresolved challenges include self-evaluation, long-term memory, reproducibility, and avoiding failure loops.

Alibaba’s Qwen 3.7 Max breaks into top tier

Qwen 3.7 Max ranked fourth globally on the Code Arena leaderboard with a score of 1541, surpassing GPT-5.5 and Gemini 3.5 Flash. It is the first Chinese model to reach this level, joining top performers dominated by Anthropic’s Claude series.

Performance driven by long-horizon autonomy

Qwen’s architecture emphasizes sustained task execution, reportedly running for 35 hours with over 1,100 tool calls without losing coherence or entering loops. Tests show it producing functional applications—such as games and simulations—on first attempt with minimal debugging, indicating strong real-world usability.

Intensifying industry-wide competition

Multiple major releases are expected within the same period, including GPT-5.6, Claude Opus 4.8, and Gemini 3.5 Pro, setting up a concentrated wave of competition. At the same time, regulatory constraints are shaping partnerships, particularly around xAI’s interactions with Cursor during acquisition negotiations.

CONCLUSION

Rapid advances in model scale, training data, and agent autonomy are converging to reshape software development and research, with competition intensifying across U.S. and Chinese AI leaders.

Full transcript

More from AI