First impressions of GPT-5.5 from Will Koh

AIOpenAIApril 23, 2026 at 08:35 PM3:23

Audio player

0:00 / 0:00

TL;DR

GPT-5.5 demonstrates a significant leap in AI coding assistance by understanding ambiguous tasks, autonomously utilizing tools, and maintaining context across complex workflows, substantially improving performance in real-world applications.

KEY POINTS

Evolution of AI Coding Models AI coding tools have rapidly evolved over the past two years from simple tab completions to models capable of independently tackling complex, ambiguous programming tasks. This progression highlights how AI integration in software engineering has matured from limited autocomplete functionality to active problem-solving agents. Early AI required highly detailed prompts, but the current generation models now interpret and execute vague instructions effectively.

First Impressions of GPT-5.5 GPT-5.5 marks a transformative step with its improved understanding of developer intentions without needing precise, directive prompts. Users can assign rough or ambiguous tasks, and the model autonomously explores the appropriate codebase sections, devises multiple potential solutions, and implements the best one. This intuitive grasp reduces the effort in coding workflows and streamlines project completion.

Integration in Practical Environments In a corporate setting at Ramp, GPT-5.5 was integrated into an existing internal tool named Inspect, which functions as a harness for coding tasks. The model accessed various resources such as company databases and telemetry tools. Unlike prior models that required manual direction for tool usage or often misapplied tools, GPT-5.5 independently identified and applied the correct tools creatively, demonstrating a novel ability to solve problems with minimal human intervention.

Autonomy and Problem-Solving Abilities The capability of GPT-5.5 to self-direct investigations and tool combinations marks a substantial advance. Earlier AI models often needed continuous human guidance or correction to complete tasks properly. GPT-5.5’s proficiency in not only understanding the problem but also selecting the right methodologies without prompt engineering signifies important progress toward fully autonomous AI coding assistants.

Handling of Large Tasks and Contextual Continuity During testing, GPT-5.5 showed an exceptional ability to manage lengthy tasks that exceed its immediate context window. Typically, such tasks suffer from information loss when the AI model resets or compacts context. However, GPT-5.5 appears to preserve key findings and maintain goal continuity across these context transitions, effectively ignoring usual compaction limits and continuing work seamlessly.

Performance on Real-World Benchmarks Ramp conducted evaluations relevant to their workflows, such as extracting detailed information from extensive financial customer documents. Success was measured by the “perfect extraction rate,” which assesses how often the AI retrieves all required information on a zero-touch basis without human correction. GPT-5.5 achieved the highest recorded perfect extraction rate, a breakthrough improving reliability and customer satisfaction.

Impact on Customer Experience Improved accuracy and autonomy of GPT-5.5 translate into a more “magical” experience for end users. Customers interacting with Ramp’s products see significant benefits through enhanced precision and reduced manual intervention, making complex data extraction virtually effortless and more dependable.

Summary of AI’s Current Coding Capabilities The transition from earlier AI coding aids to GPT-5.5 illustrates a broader trend in artificial intelligence: evolving from being tools that assist with basic mechanical coding steps into sophisticated collaborators capable of understanding intent, conducting research, and innovating solutions independently in complex environments.

Overall, GPT-5.5 is setting a new standard in AI-driven software engineering by combining contextual intelligence, autonomous tool use, and flawless multi-step task execution, with early adopters reporting exciting breakthroughs in productivity and accuracy.

Full transcript

Hi everyone, Romain from OpenAI here. Very excited to be here today with Will, who is an AI engineer. And Will, you've been using AI models for coding for a long time now. How has it felt? How has it been? It's been a wild journey. I feel like even two years ago, we started off with tab completions, and now we're all the way to the point where AI is actively doing ambiguous tasks that we assign it and partitions it and just gets it done. And it's just been kind of amazing being there at the forefront of it all. Obviously, I'm lucky enough to have access, you know, early access to some of the newest models. But yeah, it's been honestly a crazy time. Speaking of newest model, I'm very excited that you've tried GPT-5.5. What has been your first impression? My first impression of GPT-5.5 is that it is different in the sense that it actually understands what I'm trying to tell it to do. I see before previously, a lot of my prompts have to be very detailed or very instruction-y, kind of, where I'm trying to tell it like, "hey, look in this part of the code base, do this." Whereas with GPT-5.5, sometimes I become lazy and I kind of give it a very ambiguous task, but then it will figure it out. It actually directs its research and exploration to the right areas of the code base, comes up with potentially multiple options of how we could do it, and then gets it done for me. So it's been impressive. That's amazing. And I think in your work at Ramp, you were also building your own harness, right? I'd love to hear more how you've been using and testing GPT-5.5 in this harness of yours. Yeah, so at Ramp, we do have our own harness called Inspect. And it was honestly just kind of a plug and play. So we opened the API, opened it to GPT-5.5, and it worked like any other model. But the impressive part was that it was discovering ways to use the tools that we had given it, such as access to our databases, access to our telemetry tools, and figure out novel ways to solve problems using them. So it's been interesting to see how it comes up with newer ways to solve problems. Was that a magical thing that you did not see the others model do before? Yeah, the other models, I would have to direct it to use tools, or it would sometimes use the wrong tools. And it got the job done in the end, I think, one way or the other, and a lot of intervention. But with GPT-5.5, it's been discovering ways to solve the problem on its own. I'm curious, in your testing so far for the past week or so, has there been anything that really surprised you that the model got right? Yeah, I think, so with some of the tasks that I'm giving it, with the bigger tasks, it's more likely to run out of its context window. But during those compaction periods, I actually noticed it much less as if it's running on the same context. It's able to pass on the right details, the right findings, and the right goal from one compaction to the other and is able to carry on its task as if compaction never happened. I think that's amazing. Can you tell us more about the evals you've run so far? Yeah, so we have within Ramp some benchmarks that we have for our use cases such as extracting information out of large customer financial documents and how often can we get it from a zero touch we got everything correct and we call that perfect extraction rate. And we've seen that GPT-5.5 is actually performing at the highest rate of that, which is amazing for our customers. You know, it's like a magical experience for them. And we're really excited to get that into their hands.

TL;DR

KEY POINTS

More from AI