ENFR

Tech • IA • Crypto

Briefing Today's Videos Video briefings Topics Today's Top 50 Daily Summaries

OpenAI New GPT 5.5 Is A New Kind Of Intelligence (Nothing Comes Close)

AIAI RevolutionApril 24, 202616:23

0:00 / 0:00

Summary

TL;DR

OpenAI released GPT-5.5 on April 23, showcasing significant advances in efficiency, autonomous task handling, and practical real-world applications amid fierce competition in the AI market.

Key Points

GPT-5.5 Launch and Positioning
OpenAI officially launched GPT-5.5 as more than just an incremental upgrade over GPT-5.4, framing it as a new class of intelligence designed to autonomously handle extended real-world tasks. The emphasis is on functional capability and sustained reasoning rather than solely on raw intelligence metrics.

Engineering Breakthroughs in Speed and Infrastructure
Despite being larger and more capable, GPT-5.5 matches GPT-5.4’s per-token latency, a rare achievement since larger models generally run slower. Moreover, GPT-5.5 uniquely contributed to optimizing its inference infrastructure during training by analyzing workload patterns and developing heuristics, improving token generation speed by over 20%.

Performance on Leading Benchmarks
GPT-5.5 demonstrates strong dominance across a range of benchmarks:

Terminal Bench 2.0 (complex command line tasks): 82.7%, compared to GPT-5.4’s 75.1% and Claude Opus 4.7’s 69.4%.
GDP Val (44 professional tasks): GPT-5.5 achieved 84.9%, edging past GPT-5.4 at 83%, Claude Opus at 80.3%, and Gemini 3.1 Pro at 67.3%.
OSWorld Verified (real computer environment operation): 78.7%, marginally surpassing Claude Opus 78.0% and GPT-5.4 at 75.0%.
Math benchmarks show significant improvements, with GPT-5.5 scoring 35.4% on the hardest tier of Frontier Math, ahead of GPT-5.4’s 27.1%.
On ARC AGI2, a critical reasoning benchmark, GPT-5.5 scored 85%, outperforming GPT-5.4’s 73.3% and Gemini 3.1 Pro’s 77.1%.

Coding Capabilities and User Testimonials
In real-world coding challenges, GPT-5.5 shows notable progress:

On the expert SWE benchmark (long coding tasks), it achieved 73.1% compared to GPT-5.4’s 68.5%.
SWEBench Pro, measuring GitHub issue resolution, saw GPT-5.5 reach 58.6%, slightly behind Claude Opus’s 64.3%, though concerns about memorization affect those comparisons.
Users report GPT-5.5’s enhanced conceptual clarity and long-horizon persistence. For example, a CEO tested GPT-5.5 by having it debug and refactor a complex product problem that GPT-5.4 failed to solve, with GPT-5.5 succeeding.

Practical Integration with Creative AI Platforms
Higsfield introduced an MCP connector that allows Claude AI to generate creative assets seamlessly — including videos, images, ads, and landing pages — all within a single session. This integration lets AI models move beyond planning and reasoning into direct creative execution, enhancing workflows in marketing and content creation.

Real-World Applications and Enterprise Usage
OpenAI highlights internal usage where over 85% of employees utilize GPT coding tools weekly. Examples include automation of business reports saving hours per week, accelerated review of tax documents finishing two weeks early, and complex customer service workflows achieving 98% accuracy compared to 92.8% for GPT-5.4 without prompt tuning. Moreover, in browsing tasks requiring information retrieval, GPT-5.5 scored 90.1%, ahead of Gemini 3.1 Pro’s 85.9%.

Scientific Research Advances
GPT-5.5 excels in scientific domains:

On Genebench (multi-stage bioinformatics tasks), it scored 25%, outperforming GPT-5.4’s 19%, with improvements widening on longer output tasks.
Bixbench, a real bioinformatics test, saw GPT-5.5 reach 80.5% versus GPT-5.4’s 74%.
Notably, GPT-5.5 aided in discovering a new mathematical proof in combinatorial mathematics (Ramsey numbers), verified formally—a rare AI research milestone.
Scientists have used the model for in-depth gene expression analysis and mathematical modeling, drastically reducing time needed for complex research projects.

Inference Efficiency and Technical Innovation
Serving GPT-5.5 at speeds matching GPT-5.4 required redesigning the inference stack and hardware deployment on NVIDIA GB200 and GB300 NVL72 GPUs. GPT-5.5 itself analyzed production data to optimize task splitting and resource allocation dynamically, improving throughput by more than 20%—making it both more powerful and more efficient.

Pricing and Access
GPT-5.5 API pricing is set at $5 per million input tokens and $30 per million output tokens, double the rates of GPT-5.4. Pro pricing remains at $30 and $180 per million tokens. Despite higher per-token costs, improved token efficiency may mitigate overall expenses depending on usage. The context window has expanded to 1 million tokens. Access is limited to paid tiers—Plus, Pro, Business, and Enterprise, with free users excluded.

Anthropic’s Market Surge and AI Competitive Landscape
Anthropic’s valuation on secondary markets has surged to nearly $1 trillion, surpassing OpenAI’s $880 billion. After raising $30 billion at a $380 billion valuation three months ago, Anthropic’s annualized run rate reportedly accelerated 233% in one quarter to $30 billion. Enterprise adoption and a massive Amazon investment underpin this growth. Anthropic is exploring an IPO targeting a public valuation of $400–$500 billion next year, though secondary market prices are higher. Meanwhile, OpenAI shares have seen limited trading gains and more sellers than buyers.

User Experience Update
GPT-5.5 introduces a small UX change: before it starts reasoning, it presents a plan overview, allowing users to interrupt or redirect execution anytime, enhancing interactivity during complex tasks.

Conclusion
GPT-5.5 represents a significant evolution in model architecture, real-world application, and system efficiency. It continues to push AI from raw intelligence toward practical integration in coding, scientific research, and enterprise workflows—while the competitive AI landscape intensifies with Anthropic’s rapid rise. These developments hint at accelerating innovation and transformation across multiple industries.

Full transcript

Open AAI just released GPT 5.5 and this launch lands right as the AI race is getting almost ridiculous. Anthropic is climbing toward a trillion dollar valuation. Google and Xiaomi are pushing hard and OpenAI just fired back with a model built for the next phase. So the company officially released GPT 5.5 on April 23rd, positioning it as a new class of intelligence for realworld work. They're not framing this as just a capability upgrade over GPT 5.4, but as a step toward a fundamentally different way of getting work done on a computer. That distinction is key because the way they're describing this is less about raw intelligence scores and more about what the model can actually do autonomously over extended tasks. Two things stand out from a technical standpoint right away. First, despite being a larger, more capable model, GPT 5.5 matches GPT 5.4's per token latency in real world serving. That's genuinely unusual. Bigger models are almost always slower. So, matching the previous generation speed while jumping in capability is a real engineering achievement. Second, and this one is kind of wild, GPT 5.5 actually participated in optimizing its own inference infrastructure during training. For the first time, an AI helped improve the systems that run it. We'll get into exactly how that worked later. Now, the benchmark numbers, there's a lot of data here, and some of it is genuinely impressive. On Terminal Bench 2.0, 0 which tests complex command line workflows requiring planning, iteration, and tool coordination. GPT 5.5 scored 82.7%. GPT 5.4 was at 75.1%. Claude Opus 4.7 was at 69.4% and Gemini 3.1 Pro came in at 68.5%. That's a 13 percentage point lead over Claude. Not a marginal gap at all. on GDP val which covers knowledge work across 44 real professions including financial modeling, legal analysis, data science and operational planning. GPT 5.5 reached or exceeded the level of industry professionals in 84.9% of tasks. GPT 5.4 was at 83.0%. Claude Opus 4.7 was 80.3% and Gemini 3.1 Pro came in at only 67.3%. On OSWorld Verified, which tests whether a model can actually operate a real computer environment, not analyzing screenshots, but actually clicking, typing, and navigating software. GPT 5.5 hits 78.7%. Edging out Claude Opus 4.7 at 78.0% 0% and GPT 5.4 at 75.0%. There's also Frontier Math split into tiers. On tiers 1 through 3, GPT 5.5 scored 51.7% versus GPT 5.4's 47.6%. On tier 4, the hardest level, GPT 5.5 hit 35.4% compared to GPT 5.4's 27.1%. over eight percentage points on the most difficult math problems in the benchmark. On ARC AGI2, GPT 5.5 scored 85.0% versus GPT 5.4's 73.3%. Which is a really significant jump and it actually clears Gemini 3.1 Pro's 77.1% there. And according to artificial analysis's intelligence index, a weighted average of 10 evals run by an external thirdparty, GPT 5.5 ranks as the most intelligent model on average across all currently available models. Let's get into what makes this model interesting from a practical standpoint. The coding story has the most realworld weight right now. on expert SWE open AAI's internal benchmark for long horizon coding tasks with a median estimated human completion time of 20 hours. GPT 5.5 scored 73.1% versus GPT 5.4 68.5%. On SWEBench Pro, which evaluates one-time resolution of real GitHub issues endto end, GPT 5.5 reached 58.6%. Claude Opus 4.7 is higher at 64.3% on that one. Though OpenAI notes that Anthropic reported signs of memorization on a subset of those problems across all three coding evals, GPT 5.5 improved on GPT 5.4 while actually using fewer tokens to get there. More efficient and more capable at the same time. The user testimonials here are worth paying attention to because they're not vague positive quotes. Dan Shipper, the founder and CEO of EveryY, called GPT 5.5 the first coding model I've used that has serious conceptual clarity. What he did was essentially a controlled test. After launching a product, he'd spent days debugging a post-launch issue before bringing in one of his best engineers to do a partial rewrite of the system. He then gave GPT 5.5 the same broken codebase and asked whether it could arrive at the same solution the engineer had landed on. GPT 5.4 couldn't do it. GPT 5.5 could. And that brings us to the practical side of all this. Because if AI can plan, reason, and code for longer periods, it also needs a way to turn those plans into real creative assets. That's where today's sponsor, Higsfield, comes in. Higsfield is building one of the most creator focused AI platforms out there, and their new Higsfield MCP connector basically gives Claude a real creative layer. So instead of Claude just planning, writing or reasoning through a workflow, it can actually generate and edit media through Higsfield inside the same session. Videos, images, ads, landing page assets, all of that can happen in one pipeline. The setup is also really simple. Inside Claude, you go to settings, connectors, hit plus, add Higsfield, paste the MCP URL, and you're ready to go. From there, Claude Co-work, Claude Code, OpenC Claw Agents, and Hermes can use Higsfield MCP to create assets directly in your working directory. So, you can give it one prompt, have it research an angle, build creatives, generate visuals, and organize the output without bouncing between a bunch of separate tools. That's what makes this interesting. It turns Claude from something that can talk about marketing into something that can actually help execute it. Higsfield MCP also gives Agentic access to models like GPT image 2 and Cedence 2. So the quality ceiling here is pretty high. If you want to try Higsfield, check the link in the description. And now back to GPT5. Five. Pro Shirano, the CEO of Magic Path, talked about GPT 5.5 merging a branch with hundreds of front-end and refactor changes into a main branch that had also changed substantially, resolved in a single pass in about 20 minutes. Michael Truel, co-founder and CEO of Curser, noted that GPT 5.5 is noticeably smarter and more persistent than GPT 5.4, staying on task for complex, long-running engineering work without stopping prematurely. And one engineer at NVIDIA with early access said, direct quote, losing access to GPT 5.5 feels like I've had a limb amputated. Dramatic, but the point lands. OpenAI also showed off what GPT 5.5 can build inside Codeex, a space mission application using real NASA JPL orbital data supporting three-dimensional interactive manipulation with realistic orbital mechanics. An earthquake tracker pulling in live data sources with realtime visualization. These demonstrate that the model can call external APIs, process dynamic data, and build functional applications from a single prompt. On the knowledge work side, rather than leaning entirely on benchmark tables, OpenAI pointed to internal usage data. More than 85% of their employees now use Codeex every week across functions including finance, communications, marketing, data science, and product management. Their communications team analyzed six months of speaking request data, built a scoring and risk framework, and set up a Slack agent to handle low-risk requests automatically while routing higher risk ones to human review. The finance team reviewed 24,771 cone tax forms totaling 71,637 pages and finished two weeks earlier than the year before. One person on the go to market team automated their weekly business reports entirely saving 5 to 10 hours per week. On to 2bench telecom which tests complex customer service workflows. GPT 5.5 hit 98.0% accuracy without any prompt tuning compared to GPT 5.4's 92.8% 8% on browse comp which tests the ability to track down hard to find information across the web. GPT 5.5 Pro scored 90.1% beating Gemini 3.1 Pros 85.9%. The scientific research part of this release didn't get as much attention as the coding stuff and it probably should have on Genebench which focuses on multi-stage scientific data analysis in genetics and quantitative biology tasks that typically correspond to multi-day workloads for scientific experts. GPT 5.5 scored 25.0% versus GPT 5.4's 19.0%. That performance gap actually widens as tasks get longer with a clear separation appearing around the 15,000 token output mark. On Bixbench, a realworld biioinformatics benchmark, GPT 5.5 hit 80.5% versus GPT 5.4's 74.0%. The specific case that really stands out is the Ramsay number proof. An internal version of GPT 5.5 equipped with a custom tool framework helped discover a new mathematical proof about Ramsey numbers, a core research area in combinatorial mathematics where new results are genuinely rare and technically difficult. The proof was then verified in lean, the formal proof verification system. This isn't GPT 5.5 writing code about Ramsay theory or explaining the topic. It contributed an actual mathematical argument in a domain where that kind of contribution almost never happens. Daria Unutmaz, an immunology professor at the Jackson Laboratory for Genomic Medicine, used GPT 5.5 Pro to analyze a gene expression data set with 62 samples and nearly 28,000 genes, generating a full research report that surfaced key findings and open research questions. Work he said would have taken his team months. Barto Nascranki, an assistant professor of mathematics at Adam Mikovich University in Poland, built an algebraic geometry application from a single prompt in 11 minutes, visualizing the intersection of two quadratic surfaces and converting the result into a wire stress model with realtime coefficient display that can be directly fed into further mathematical research. Brandon White, co-founder of Axiom Bio, put it plainly, "If OpenAI keeps cooking like this, the foundations of drug discovery will change by the end of the year." Now, the inference efficiency story. This might actually be the most technically interesting thing in the whole release, and it's easy to scroll past. Serving a model this large at GPT 5.4 speeds required rethinking the inference stack from the ground up. GPT 5.5 was co-designed for, trained with, and served on NVIDIA GB200 and GB300 NVL72 systems. Before GPT 5.5, OpenAI split requests on a GPU into a fixed number of chunks to balance workload across computing cores. But a predetermined fixed number isn't optimal for all traffic patterns. So, Codeex running on GPT 5.5 analyzed weeks of production traffic data and wrote custom heristic algorithms to optimize how work gets partitioned across the hardware. That single improvement increased token generation speeds by over 20%. The model built the tools that made it faster to run. Now, the API pricing for GPT 5.5 is $5 per million input tokens and $30 per million output tokens. Exactly double GPT 5.4's rate of $2.50 and $15. GPT 5.5 Pro in the API runs $30 per million input tokens and $180 per million output tokens, same as GPT 5.4 4 Pro was batch and flex processing get a 50% discount and priority processing is 2.5 times the standard price. The context window is 1 million tokens. Sam Alman's argument is that since GPT 5.5 completes the same codeex tasks with significantly fewer tokens, the real world cost increase may not be as dramatic as the per token rate suggests. That's plausible, but it depends heavily on your use case. For context, Xiaomi's Mimo V2.5 Pro is $13 per million tokens in and out. Miniax M2.7 is 30 and $1.20. And Kimmy K 2.5 is 44 and $2. GPT 5.5 is operating in a completely different pricing tier from most of the competition. in chat. GPT GPT 5.5 launches as GPT 5.5 thinking for plus, pro, business, and enterprise users with GPT 5.5 Pro rolling out to pro business and enterprise. Free users are not getting access. Unfortunately, there's also a small UX change worth noting. Before the model starts its reasoning process, it gives you an overview of its plan and you can interrupt and redirect at any point during execution. Now, briefly, while all this is happening at OpenAI, something genuinely surprising is going on with Anthropic in the background. On secondary share trading platforms like Forge Global, Anthropic is currently trading at roughly $1 trillion in implied valuation. And OpenAI on the same platform sits at about $880 billion. That means Anthropic is trading above Open AI for the first time. Three months ago, Anthropic closed a $30 billion series G round led by GIC and Kooichu at a post money valuation of $380 billion. Secondary markets are now pricing it at almost three times that figure. The revenue growth driving this is striking. Anthropic's annualized run rate went from roughly $9 billion at the end of 2025 to $30 billion by March 2026. a 233% jump in a single quarter, fueled largely by enterprise adoption of clawed code and API products. Amazon's commitment of up to $25 billion in additional investment added further fuel. Caplight, which tracks private market share activity, reported that interest in anthropic shares spiked over 650% in the last 12 months. And Glenn Anderson of Rain Maker Securities said a $960 billion valuation described as unthinkable just a month earlier was being snapped up by competing buyers within hours. Open AAI, by contrast, is trading just 3% above its last primary round valuation on Forge, and Capite found more sellers than buyers for OpenAI shares in Q1. Anthropic is reportedly exploring an IPO as early as late 2026 with Goldman Sachs and JP Morgan advising, targeting a valuation in the 400 to500 billion range for the actual public debut, which yes, would be significantly lower than what secondary markets are implying right now. But that's just how private market dynamics work. Also, if you want more content around science, space, and advanced tech, we've launched a separate channel for that. Links in the description. Go check it out. Anyway, that's the full picture on GPT 5.5 and where things stand in the broader AI market right now. If you found this useful, drop a like. Thanks for watching and I'll catch you in the next one.

More from AI