ENFR

Tech • IA • Crypto

Today My briefing Videos Top articles 24h Archives Favorites My topics

The prompt to turn ChatGPT 5.5 into an AI agent!

AIParlons IAMay 10, 2026 at 06:00 AM23:38

0:00 / 0:00

TL;DR

ChatGPT 5.5 introduces a faster inference architecture and more advanced agent-based workflows, shifting AI use from simple prompts to structured, memory-driven systems.

KEY POINTS

New High-Speed Architecture

ChatGPT 5.5 marks a major shift in infrastructure by leveraging Cerebras processors, moving beyond traditional NVIDIA H100/H200 GPUs. These large-scale chips significantly reduce latency, boosting generation speeds from roughly 65 tokens per second to up to 1000 tokens per second. The change enables near real-time responses and supports more complex, long-running tasks.

Post-Inference Task Overlap

The system no longer processes tasks strictly linearly. Instead, it uses post-inference task overlap, allowing simultaneous execution of steps while retrieving prior outputs instantly. This reduces idle time between operations and enables continuous reasoning across tasks, improving efficiency in multi-step workflows.

Rise of Agentic Systems

ChatGPT 5.5 operates as an agentic system, similar to Claude 4.7, where prompts define decision loops rather than isolated outputs. These loops include context gating, decision-making, and verification, forming a continuous execution cycle. The model can classify requests and route them to appropriate tools automatically.

Simplified Prompting with RLHF

Thanks to reinforcement learning frameworks (RLHF), basic users can issue simpler prompts while still achieving useful results. However, for professional use, detailed system instructions remain critical. The model executes trained behaviors but does not independently choose optimal strategies without structured guidance.

Structured Prompt Engineering Standards

Developers are increasingly adopting structured formats such as Markdown (under 500 lines) and simplified XML (over 500 lines) to define agent behavior. These formats organize instructions into blocks covering tools, workflows, and fallback logic, improving stability and reproducibility in AI systems.

Tooling, Constraints, and Verification Layers

Effective agent design requires explicit definition of tools, allowed actions, and fallback strategies. Systems must also include iteration and validation layers, ensuring outputs meet predefined constraints such as formatting rules or business requirements. Without this, agents risk failure or inconsistent outputs.

Critical Role of Memory Systems

A key limitation of large language models is declining performance over long contexts, especially when exposed to irrelevant data. ChatGPT 5.5 addresses this with persistent memory structures, allowing agents to store intermediate results and reuse them across workflows. This improves consistency in tasks lasting hours.

Parallel Multi-Agent Workflows

The system supports parallel sub-agents, enabling simultaneous data retrieval, analysis, and processing. For example, separate agents can analyze different files or datasets concurrently, significantly reducing execution time. This “fan-out” approach is central to scaling productivity.

Dataset-Centric AI Operations

Professional deployment requires predefined datasets and structured inputs, rather than open-ended web queries. Organizations must specify variables such as user IDs, transaction data, or document sources to ensure traceable and verifiable outputs. Unstructured prompting limits reliability and auditability.

Shift from Consumer to Enterprise Use

The gap between casual and professional AI use is widening. Consumer-style prompts produce generic outputs, while enterprise systems rely on controlled workflows, defined data access, and agent coordination. This transition positions AI as a programmable operational layer rather than a simple assistant.

CONCLUSION

ChatGPT 5.5 represents a shift toward faster, structured, and memory-driven AI systems, where performance gains depend not just on model capability but on how effectively organizations design and manage agent-based workflows.

Full transcript

ChatGPT 5.5, I'm going to reveal all its secrets. We'll talk about the new computing system with fast inference, and prompt engineering adapted to ChatGPT 5.5. I'll show you how to model prompts. We'll be able to code agents that work both on Claude's server and on ChatGPT 5.5. Why do we need memory? What's different today with AI systems, and what's changed in ChatGPT 5.5 compared to previous versions? All of this is in this video. The first thing that struck me when using ChatGPT 5.5 was the speed. Do you know why? Because they completely changed the architecture. They switched to a GPU system called Cerebras. Here's what it is. Nvidia used to be the benchmark. The Hopper H100 chip, the H200 chip, the standard on all LLM server systems. But today, this paradigm has reached a limit: the speed at which the models respond. This is where a new partner emerged: Cerebras, creator of the industry's largest processor. Each model has been significantly optimized in terms of sheer size. Gigantic TPU plates, one square meter in size, will accelerate latency processing by 40%, increasing it from 65 tokens per second to 1000 tokens per second. Furthermore, there's a complete internal architecture overhaul. Previously, the model had to make continuous round trips with the API interfaces. The change is that with the Cerebras system, coupled with the new architecture, the model directly retrieves the previous response and immediately begins the next steps. So the model isn't trained to work linearly, but rather to identify itself based on your requests, and it already knows in advance what type of processing it needs to perform. This method allows it to classify the type of request you're making, to store the tokens generated during the discussion in memory, and to improve what's called logical routing of the actions to be taken. So instead of waiting for a task to finish, it can improve what's called post-inference task overlap. As a result, we have a machine that's significantly faster, and that's the first thing that surprised me. Let me know in the comments if you 've experienced this speed boost with the new Cerebras TPU chips. I mentioned it a few months ago, and it went completely unnoticed by the media, but the future was already here. I told you so. Let's move on to prompt methods because in this video, I'm going to show you how to manage memory storage and the directories that will allow you to create agentic systems with ChatGPT 5.5. Let's start by understanding what an agentic system means. We're dealing with ChatGPT 5.5, which, like Claude 4.7, is an agentic system. What you need to understand is that your prompt will configure this large loop. This loop is what we call the context gating, the decision-making, and the verification of results. This loop is called the agentic loop. Your prompt will define what the model will use in the context and what decisions it will make regarding actions. Until now, when creating an AI agent, you had to provide a lot of detail, specifying each step to build the AI system. What changes with ChatGPT 5.5 is that it uses reinforcement learning, what we call RLHF (Reinforced Learning Framework). So, for the general public—I'm not referring to professionals—we can provide fairly simple instructions. "Notes system for a PowerPoint presentation," this type of prompt is often used by the general public. So, the important thing to be careful about in the marketing aspect of how we present AI agents is that they aren't magic machines, if you will. They are machines that are trained to perform tasks. This means that today, the model is trained to understand that if we ask it to check our meeting, it automatically connects to Google Calendar. If we ask it to create events, it knows how to check our availability and perform what's called a query on an MCP function. So all of this is thanks to a model that has received training. But what the model will never be able to do on its own is choose the best methods to produce exactly what you want. And that's where you come in and code the system's instructions. We'll look at the official documentation for developers to develop AI agent systems. But what are the main things to remember about these systems to be able to use them? If your overall sequence is less than 500 lines, you use Markdown. If, however, your system exceeds 500 lines, the model incorporates the simplified XML structure as its top layer. So we add this type of structure inside, and you have to use it every time you code an agent-based system. As soon as you have complex instructions, which is highly recommended for the stability of the prompt system, you should build this type of system. When you look at the instructions written on OpenAI, you immediately realize one thing. We didn't write, "You are ChatGPT, the best AI with 15 years of experience. You have all the skills," not at all. What we have is, for example, "You are an expert in writing prompts." Okay? And now that you 've told it that, is it an expert in writing prompts? If tomorrow I tell the chatbot, "You are an expert in scheduling medical appointments," will it be better at classifying the patient's problem? Will it be able to understand the logic behind the decision-making process? And when you look more closely at what's written in the official OpenAI training materials, and also in Claude's work, you have a structure that defines simplified XML. So we have groups of instructions formatted within blocks. We have a first block that defines what you need to do with the tools, when you use that tool in relation to the file type. In the event that a file presents a problem, what option will you use in your command functions? So, why do we actually have this architectural system in agentic systems? Because we're dealing with agent loop systems; we give them instructions on where the model will have to take actions, it has to read a file. Let's say that for some reason, it can't read the file. So if you don't give it the option to use another tool, or if it doesn't learn to do so, it will crash and fail. So the idea behind a prompt system is to give it the ability to understand how to resolve situations or how you want it to work for your company. That's the whole point. In other words, you customize the behavior of ChatGPT 5.5 or Claude 4.7 with absolute precision. You make it an employee well-versed in your professional practices. The way you speak to Claude will inevitably impact the results. And the difference between a bad prompt and a good one is what will take you from a not-bad but generic result to a truly exceptional one. And to know if it's doing it right, we add another layer: the iteration layer. This means that in the way the AI works, it must check each time if the tool it used and the tokens it produced correspond to the instructions you provided. This is where we find the sections called "editing constraints". This is where the model will understand how to produce your work. The way you want to create your PowerPoint presentation must meet certain criteria. You're not going to ask it, like all the YouTubers do, "Make me a presentation for a pitch deck for a $5 million fundraising round." No, you want it to include your logo, your company name, a specific color scheme, and a set of topics. "Right here, we have the result. The request was to do an analysis on rare earth elements. We can see that we gave it a lot of instructions. Again, rare earth elements, executive synthesis, global production, global reserves. We can see that, there you go, the answer is already quite dense." Just because you say that to an AI, does that make it a geopolitics expert? I'm asking you. Do you think that this actually allows the AI to understand? Now, AI has become an architectural system. How is the task handled on the front end? How do we introduce the topic, what are the operating rules, and how do we present the content you want to format? This is the official documentation. So you realize that all the prompts you see on social media don't correspond to what you need to do in a professional setting. If you really want to do it for fun, you can continue using AI for that. But what 's happening is that you're missing out on 99% of AI's potential in today's professional world. I do have some advice for you on how to write instructions for agent systems. We'll give an example later in the video to show you that, firstly, in the chat interface that most people use, you now need to use memory storage. When you work with a model, the further you progress in the discussion, the more the context window fills up. But the problem is that the context window in Claude or ChatGPT isn't a linear interface. People tell you, "Models have a million contexts." Yes, but that's not the case. Models don't perform the same at the beginning of the window as they do at the end. And as soon as models encounter what we'll define in studies as "distractors"—that is, extraneous information, poorly written elements, elements they can't connect— well, you see that the model suddenly drops to less than 50 or 60% context understanding. And on top of that, you can send a system of tools in parallel. I'll show you that at the end of the video. When OpenAI designs a workflow, it designs, first and foremost, an objective. Next, we have input elements. So, what you need to understand about input elements is that the data you're going to retrieve requires creating what's called a dataset. In the OpenAI documentation, the section for professionals is aimed at you—those of you who want to code AI agents for your company to accelerate your work and increase your productivity. What we see is that the system will be able to create several agents simultaneously: a search agent, a knowledge base agent on a file system (it could be RAG, it could be databases), and another agent that will make decisions based on data. To do this, the system needs to code the dataset. This means you can't ask an AI, as you still see on YouTube and in internet videos: "Go find some data on the internet. I want to give a presentation, go find some data for me." That's not possible. That's how the documentation ChatGPT 5.5 is being promoted to the general public. But this isn't the right approach for businesses because this system means you don't control any of the model's steps. The model is trained to execute steps. So that's fine, that's cool, that's good, that's great. But there's a problem: you don't know what the model chooses. You don't know what criteria it uses. Consequently, these workflows run without you, and therefore you can't verify the accuracy of the model's data. So, what does this mean? It means that when you create agentic systems, the first thing to do is identify the dataset and the variables you're working with in that system. That's exactly what's done when you code agents. Let's say you're looking for information about a customer. You need their user ID, name, and possibly the transaction name—all the necessary elements to retrieve specific data from your database. For example, the agent who handles customer service. This highlights one thing: there are essentially two ways to work with AI. One way is fun, the one shown in online videos, but that's not the one you can use in a business environment because you can't tell an AI, "I have a customer service department, take care of... you're a customer service specialist, handle the customer issues." That won't be enough. That's the key message you need to grasp. So you've moved to a system that understands your request, defines all the tools it will deploy to virtual employees, AI agents, ChatGPT, and Claude. In fact, each time, you have to define what Claude represents, what ChatGPT represents, what Claude represents, and what ChatGPT represents. The system self-evaluates, exchanges information, and creates an autonomous system capable of executing a task. You'll write prompt systems to coordinate your AI agents, memory, MCP functions, and tools within this loop. And that's exactly what we do with Level 2: "Mastering the Best of AI." You'll learn to automate what drains you. You'll get certified because I give you all the methods to pass the certification exams. You'll master your data and, in less than 15 days, you'll know how to scale your business with skills that no one else on the market currently possesses. I'm not giving you a tutorial; I'm giving you an artificial intelligence ecosystem for the professional world. At some point, you need to take the time to understand: how do I build a dataset for a search database on tools, for an agent knowledge system, and how do I build my queries to retrieve this information from a database? If we don't do that, the whole layout is just fluff. That's the gap between the social media side and the professional deployment side. So, when we build our system, the first thing is to structure it. The input database, everything, needs to be pre-sorted for the dataset system. Then, we need to filter what the model is allowed to access, what it can't access, and based on what criteria. The actions. And here, we're talking about the workflow. Step 1, step 2, step 3. The model needs a defined framework for its operation. For example, I've included a prompt. So you pause the image. I'll put it in the course materials. You'll find the complete course materials in the description below. And with these templates, you'll be able to build agentic systems for ChatGPT 5.5. There is one method, however, that I think we can improve. with ChatGPT 5.5. And here, for me, is Claude, who is very, very good at modeling agent systems. We're going to test it, right? We'll do a live test. I'm doing a completely live video with you right now, but for me, today, Claude has a structure that I find much clearer. So the idea is still to have, based on a YAML system, the name, the description, the system tools that will be integrated into it, and then the workflow system inside and the system's objective. So this block, as it's written, I find much clearer, much more functional, and above all, we'll be able to code agents that work both with Claude and with ChatGPT 5.5. So the idea for the test we're going to do is: 1) to help you understand one point, the function of "memory." Why do we need memory? What's different with AI systems today? As I explained, the weakness of all AIs today is that they lose consistency during a conversation because of performance drops. What happens is that the longer the workflow (so ChatGPT 5.5 can work for 4 to 5 hours on cases with a very high score ), the more it needs to remember what it has done, what remains to be done, and how to execute its work. And the problem is that the longer the conversation gets, the more its performance degrades. So, at some point, it's necessary to save the data, to make memory a reference point for how the AI works. And this is possible. So, I'm going to show you this method, which will allow you to study and understand the entire mechanics of AI agents in a single video. So, initially, what I wanted to do was study the ChatGPT documentation. The first thing I discovered was that if I asked it to use the official documentation, what it did was connect to all internet searches. That 's not at all what I want to do. So, we need to define the scope of the tools. Let me show you something. 1) Here, I created a framework. So I told it: "A fetch function, not a web search." I gave it the three links. Then, I required it to use an "image view" function for handling PDFs because I have a PDF inside. So, that means that in my system, I'm going to ask it to use a tool, a "system tool," for the PDF part. So, we're using the same logic we discussed earlier. I'll go over it again so it's clear. What data are you retrieving? Which tool do you want to use? What methods are you going to use to verify that you've done your work correctly? Okay, now it's: how am I going to optimize my system? I have three files. So what can I do? Well, I'm going to tell it: "In the analysis step, you're going to launch three independent systems to search for three elements separately." So, since the model is well-trained, I was able to write the fairly basic instructions this way. And I'll talk about the /mnt/ directory right after. And what I've noticed is that if I ask it to launch agents, here's what happens. It's able to use both the memory—you have it right here on the right of the screen—and simultaneously launch agents that will use simultaneous tools. The goal is to save time. The key point is: the more efficient and quick I am in my work, the faster I finish my tasks and can move on to the next, and the more I increase my productivity level. So today, the concept of architecture is about being able to visualize where I can multiply... The number of agents working simultaneously while keeping them organized. But I think the area for improvement is the instruction section, because the way OpenAI, if you will, shows us how to write agents is quite basic. When you start looking at the official documentation, however, you realize it's completely code-like. So what I propose is that I'll use Sonnet 4.6. We'll start with the prompt I wrote earlier, but we'll use the prompt structuring methods used for AI agents for Claude. So what the system needs is: 1) the "agent" template. We'll modify it slightly because there are a few elements to add to the system's operation. So, we'll use this type of template here. The model should be able to model. We need to add three sub-agents for it to work. So, it'll be doable. What we're going to add here are the "system tools." So, read, grep, and glob. And we'll add the fetch functions. Then, we have what's called a "deny tool," meaning it's not allowed to use a "web search" function. We'll remove the model because we'll use the default model from the pop-up window, and we'll add the "parallel sub-agent" function to code the elements. So, we'll code a "workflow" section. In the "tools" section, there's the fetch function, and here we'll tell it to use the sub-agent in parallel, which is called a "fan-out." Now, what's interesting is that since the models are quite well-trained, we'll see if it understands what I mean, that it doesn't mix the two systems; we'll take our time. We're doing this live together, you know, to show you how these templates will actually speed up the system. Instead of spending hours coding instructions, we'll be able to use template systems by giving them the general guidelines for how we operate and improve the stability of the model's operation. So, I'm going to ask this to both Claude and Sonnet, and I'm curious to see how ChatGPT 5.5 will react to the same request. The only thing I'm going to do is "respond in a code window in Markdown format." That's the little dot I should have added. That way, it will already be in the correct format. So, what does the system tell us? It tells us the name. So it gives it a name. It's a little long. Normally it's a 64- character "slug. " Description: analysis, read, write, view_image, web_search and deny, allow_link, memory storage, data file, memory.md, pedagogical analysis, data extraction, operating conditions, general object, fetching of authorized sources. So we have an active "fan-out" function, parallelization, mission, sub-agent architecture, visual and pedagogical analysis. So did he do it only for the PDF? Sub-agent 3, I want a specialization only for the PDF. Ah yes, that's good, he did it. "Table and PDF," so it's OK. So it's integrated. Next, extraction format, paragraph N, block, local memory, the "write" function adds the data sequentially. He understood everything. So here, we've just coded an agent system. There you go, you have it in front of you. So I'll have it below in the description in the lessons. We've just coded it, you have the template which, for me, is good. Okay, that's good. So we're going to test our AI agent and we'll specify one thing: don't launch the skill function, because this type of structure could be associated with a skill function. So we'll send it the request telling it, "Skip the skill, only execute the workflow part." So we'll see how it behaves. First, it will start creating the "DEM Data Memory." It will record each of its actions, what it does, and also, to some extent, its operating mode. This helps stabilize the model's operation within a larger context. Then, as we mentioned earlier, since it needs to analyze the images, it will perform what's called "parsing." It's currently dividing the PDF into blocks. Here's what it does: "I analyze the PDF and extract the PDF blocks using chunk functions." This type of system allows it to extract both the images and the graphs, and to retrieve the information. It can't do this all at once; it can only process the data in blocks because of a 10,000-token limit in the tools. So, beyond 10,000 tokens, it has to parse the data. So I didn't need to tell him because he already knows. That's information he already has. However, what the model didn't have—and this is important—is that I clearly specified that as soon as there are images and PDFs, it should launch the `view_image` function. That's the crucial point that changes the model's behavior. If you don't define the tools and the architectural logic, you can't let it work that way. So what's happening now is that we've started it, we've launched it into `run` mode, and now it will execute its work to completion. And I 'll also show you—look at what's going to happen— what actually updates the memory: all the data block by block. So we're no longer working in the classic context window; we're working in reusable, exportable memory blocks. The further you progress, the more it indicates: "Okay, I've started working on this information. Here's the data for the topic, its operation, its architecture, and the elements indicated within the topic." It extracts all the data block by block, document by document, and updates its memory. If you finish your discussion, what you retrieve is this memory, and you can open this memory with another agent in another discussion or retrieve it directly on your computer. This is the strength of agentic systems: your work can be scalable, and you are therefore able to implement it. The more you discuss, the more you see that the model continues to integrate information, and what it will do is launch parallel agent systems to save time. So, if you really want to save time, you should work in Codex. In Codex, you will be able to much better appreciate the power of the Cerebras chips in ChatGPT 5.5. On the chat interface, it's still much slower, although the model is much faster than before, to be honest. But you can see that it continues to write, analyze, and extract data. So, here we go. We can leave it for half an hour, 45 minutes, an hour. I've shown you many videos. I'll give you some videos I mentioned previously to help you understand what I was telling you about a few months ago. It's happening. We have memory-based agentic systems with high inference. The question now is: apply what I've shown you. Take stock to understand, first, code your agents. Take the agentic structure. For me, the cleanest is Claude's, to which we'll add parallel functions. It's here. So we can launch parallel systems. That's the whole point. The goal is to save you time. So the model is designed for this: delegating tasks to different agents, retrieving data, and systematically creating dataset systems. It is necessary that Your AI works with structured data states. It's the best way to obtain verifiable data. And I think what's holding companies back today is that they're still using workflows designed for the general public. You need to switch to the methods I'm showing you. This is what we use to develop agentic systems. But once you understand the system, you'll see it's not that complicated. All the information is in the description, along with the lessons for you to practice. If you haven't already, subscribe, share, like, and if you enjoy this content, feel free to recommend it to your friends and colleagues. See you soon!

More from AI