ENFR

Tech • IA • Crypto

Today Briefing Videos Top 24h Archives Favorites Topics

Shocking new AI just hit 12 million tokens with 1000x less compute

7/10

AIAI RevolutionJune 19, 2026 at 11:35 PM15:00

Audio player

0:00 / 0:00

TL;DR

SubQuadratic claims a breakthrough in AI attention that enables efficient reasoning over millions of tokens, potentially reshaping how large-scale document analysis is done.

KEY POINTS

A Fundamental Bottleneck in AI

Modern transformer models suffer from quadratic scaling, where doubling input length quadruples compute due to token-to-token comparisons. This makes processing large documents—such as full codebases or legal contracts—extremely expensive. As a result, most systems rely on retrieval pipelines that only analyze fragments rather than entire datasets.

Introducing Subquadratic Sparse Attention

The company’s core innovation, Subquadratic Sparse Attention (SSA), selectively computes only meaningful token relationships based on semantic relevance. Unlike earlier sparse methods that follow fixed patterns, SSA dynamically identifies important connections, allowing both attention and selection steps to scale linearly rather than quadratically.

Efficiency Gains at Massive Scale

Performance metrics show dramatic reductions in compute. At 1 million tokens, dense attention requires about 252 petaflops, while SSA uses just 3.9, a roughly 64× reduction per layer. Compared to FlashAttention-2, SSA matches performance at 16,000 tokens and becomes 56× faster at 1 million tokens on an NVIDIA H100.

Record Long-Context Performance

The model SubQ 1.1 Small, released June 16, 2026, achieves near-perfect retrieval accuracy across extreme context lengths. It scored 100% accuracy at 1M and 2M tokens, and 98% at 6M and 12M, despite not being trained specifically at the highest range. At 12M tokens, it attends to only 0.13% of possible token pairs.

Competitive General Reasoning Ability

Despite its focus on long context, the model maintains strong reasoning performance. It scores 85.4% on GPQA Diamond, below top-tier systems like GPT-5.5 (93.2%) but above smaller models. On LiveCodeBench v6, it achieves 89.7%, outperforming several established competitors.

Real-World Task Benchmarking

On Automation Bench Finance, which simulates enterprise workflows across 500 APIs and 47 applications, SubQ 1.1 Small scores 13%, close to leading models like GPT-5.5 at 18%. The benchmark requires multi-step reasoning with no partial credit, making the result notable for a smaller, specialized model.

Independent Verification and Skepticism

Results were partially verified by Appen, confirming high retrieval accuracy at scale. However, skepticism remains due to past overpromises in long-context AI. Earlier discrepancies between internal and external benchmarks, along with limited real-world deployment evidence, continue to raise questions.

Implications for AI Infrastructure

If validated in production, SSA could reduce reliance on RAG pipelines, vector databases, and chunking systems. Many current enterprise AI architectures exist to compensate for limited context windows; removing that constraint could allow models to process entire documents directly, simplifying system design.

Roadmap and Industry Context

SubQuadratic plans larger models supporting 2M to 12M token contexts later in 2026, with broader rollout underway. The company has raised $29 million at a $500 million valuation, entering a competitive field where prior approaches like Mamba, RWKV, and hybrid transformers have struggled to fully overcome scaling limits.

CONCLUSION

SubQuadratic’s SSA approach, if validated in real-world deployments, could mark a structural shift in AI by eliminating one of the field’s most persistent computational constraints.

Full transcript

More from AI