
Tech • IA • Crypto
SubQuadratic claims a breakthrough in AI attention that enables efficient reasoning over millions of tokens, potentially reshaping how large-scale document analysis is done.
Modern transformer models suffer from quadratic scaling, where doubling input length quadruples compute due to token-to-token comparisons. This makes processing large documents—such as full codebases or legal contracts—extremely expensive. As a result, most systems rely on retrieval pipelines that only analyze fragments rather than entire datasets.
The company’s core innovation, Subquadratic Sparse Attention (SSA), selectively computes only meaningful token relationships based on semantic relevance. Unlike earlier sparse methods that follow fixed patterns, SSA dynamically identifies important connections, allowing both attention and selection steps to scale linearly rather than quadratically.
Performance metrics show dramatic reductions in compute. At 1 million tokens, dense attention requires about 252 petaflops, while SSA uses just 3.9, a roughly 64× reduction per layer. Compared to FlashAttention-2, SSA matches performance at 16,000 tokens and becomes 56× faster at 1 million tokens on an NVIDIA H100.
The model SubQ 1.1 Small, released June 16, 2026, achieves near-perfect retrieval accuracy across extreme context lengths. It scored 100% accuracy at 1M and 2M tokens, and 98% at 6M and 12M, despite not being trained specifically at the highest range. At 12M tokens, it attends to only 0.13% of possible token pairs.
Despite its focus on long context, the model maintains strong reasoning performance. It scores 85.4% on GPQA Diamond, below top-tier systems like GPT-5.5 (93.2%) but above smaller models. On LiveCodeBench v6, it achieves 89.7%, outperforming several established competitors.
On Automation Bench Finance, which simulates enterprise workflows across 500 APIs and 47 applications, SubQ 1.1 Small scores 13%, close to leading models like GPT-5.5 at 18%. The benchmark requires multi-step reasoning with no partial credit, making the result notable for a smaller, specialized model.
Results were partially verified by Appen, confirming high retrieval accuracy at scale. However, skepticism remains due to past overpromises in long-context AI. Earlier discrepancies between internal and external benchmarks, along with limited real-world deployment evidence, continue to raise questions.
If validated in production, SSA could reduce reliance on RAG pipelines, vector databases, and chunking systems. Many current enterprise AI architectures exist to compensate for limited context windows; removing that constraint could allow models to process entire documents directly, simplifying system design.
SubQuadratic plans larger models supporting 2M to 12M token contexts later in 2026, with broader rollout underway. The company has raised $29 million at a $500 million valuation, entering a competitive field where prior approaches like Mamba, RWKV, and hybrid transformers have struggled to fully overcome scaling limits.
SubQuadratic’s SSA approach, if validated in real-world deployments, could mark a structural shift in AI by eliminating one of the field’s most persistent computational constraints.