ENFR

Tech • IA • Crypto

Today Briefing Videos Top 24h Crypto Archives Favorites Topics

This AI beats Opus 4.8, GPT-5.5 and Gemini (Sakana Fugu)

7/10

AI Eng.Ben BKJune 23, 2026 at 04:53 PM9:03

Audio player

0:00 / 0:00

TL;DR

Japanese lab Sakana AI unveiled Fugu, an orchestration model that coordinates multiple leading AIs and claims top-tier benchmark performance, highlighting a shift from single models to multi-model systems.

KEY POINTS

A new kind of AI: orchestration over scale

Fugu is not a standalone frontier model but an orchestration system designed to manage other models such as Claude, GPT, and Gemini. It receives a single query, decides whether to answer directly or decompose the task, and distributes subtasks to specialized models before merging results into one response. The system can recursively call instances of itself, functioning as a coordination layer rather than a monolithic intelligence.

Performance claims rival top models

Sakana reports that Fugu Ultra achieves 73.7 on BenchPro, outperforming GPT‑5.5 (58.6) and Claude Opus 4.8. It also posts 95.5 on GPQA Diamond, 93.2 on LiveCodeBench, 82.1 on TerminalBench, and 50 on Humanity’s Last Exam, roughly matching Opus 4.8. Notably, the orchestrator reportedly surpasses some of the very models it coordinates.

Benchmarks remain unverified

All results come from Sakana, with no independent validation so far. Comparisons with restricted models like Fable and Mythos rely on reported reference scores rather than direct testing under identical conditions. Even within Sakana’s own data, Fable 5 appears to retain an edge on certain coding benchmarks, underscoring uncertainty around headline claims.

Built on published research

The system draws on two peer-reviewed approaches. Trinity, a lightweight coordinator (~0.6B parameters), assigns roles such as thinker, executor, and verifier using a small optimized control head. A second model (~7B parameters) is trained via reinforcement learning to manage agent communication strategies in natural language. This academic grounding distinguishes the project from purely marketing-driven releases.

Geopolitical backdrop boosts relevance

The launch follows U.S. restrictions limiting access to advanced models like Fable and Mythos for non-American users. This disruption highlights the risks of dependence on a single provider. Sakana positions Fugu as a resilience strategy: a system that dynamically leverages multiple models, reducing exposure to policy or access shocks.

Opaque decision-making raises concerns

Fugu operates as a black box: users cannot see which models were used or how tasks were distributed. This limits auditability, reproducibility, and attribution—key issues for enterprise and regulated use cases. The lack of transparency may hinder adoption despite performance claims.

Costs are unpredictable

Pricing starts around $20 to $200 per month, with usage-based fees near $5 input / $30 output, increasing for large contexts. However, real costs depend on how many underlying model calls Fugu triggers. Early tests show mixed results, with some cases cheaper than single-model use and others significantly more expensive due to hidden multi-model orchestration.

Limited availability in Europe

Fugu is currently unavailable in the European Union, as Sakana works toward GDPR compliance. This delays access in a major market and reflects broader regulatory friction facing advanced AI deployments.

CONCLUSION

Fugu signals a shift from building ever-larger models to coordinating many systems intelligently, but its real-world impact will depend on independent validation, transparency, and cost control.

Full transcript

More from AI Eng.