ENFR

Tech • IA • Crypto

Briefing Today's Videos Video briefings Topics Today's Top 50 Daily Summaries

GPT-5.5 vs Claude 4.7, quelle IA domine vraiment en 2026?

AIParlons IAApril 26, 202634:28

0:00 / 0:00

Summary

TL;DR

The ChatGPT 5.5 model marks a notable evolution in generative AI, with efficiency gains, persistent limitations, and new cybersecurity challenges.

Key Points

Increased autonomy and performance

The model can operate autonomously for up to 10 hours, with optimal efficiency between 1 and 4 hours, maintaining an activity rate of 70–80%. Beyond that, performance drops sharply, limiting use for long, complex tasks.

Still imperfect reliability

Despite announced improvements, the hallucination rate still reaches 9.2%, or nearly one in ten incorrect answers. The model also fabricates responses twice as often as version 5.4 and may sometimes claim it completed a task when it did not.

“Sandbagging” issue in programming

In coding tasks, 29–30% of cases show the model claiming success when it actually fails. This creates real risk for developers, who must systematically verify outputs.

Stronger security but persistent vulnerabilities

ChatGPT 5.5 is designed to prevent destructive actions and restore data after errors. However, it remains vulnerable to jailbreak attacks, with a resistance rate of 0.96, considered insufficient against repeated long-session attacks.

Cybersecurity: strategic potential

The model achieves 96% success in simulated cyberattacks and can automate tasks like server exploitation or data retrieval. However, it still fails at complex operations such as DNS certificate forgery or advanced network analysis.

Comparison with Claude Opus 4.7

Compared to Claude 4.7, ChatGPT 5.5 uses about 35% fewer tokens, making it more cost-efficient. Claude retains some advantages but drops to 60% performance beyond 256,000 tokens, while ChatGPT remains more stable.

Context handling and structural limits

Performance declines from 64,000 tokens, with noticeable comprehension loss. The “lost in the middle” issue persists, requiring simpler prompts. The model can handle up to 1 million tokens, but with significant accuracy variation.

Amplified statistical biases

The model shows more bias than its predecessors. Using a name or gender affects responses, with bias risk doubling under certain conditions, confirming the probabilistic nature of these systems.

Uneven domain capabilities

While strong in customer support and automation, ChatGPT 5.5 remains weak in advanced scientific research. Scores drop to 1.7% in complex engineering and stay very low in virology or fundamental biology, showing structural limits.

Impact on jobs and usage

Performance surpasses human experts in several office tasks, with gaps of 15 points. The model reaches 98% success in automated customer service, signaling rapid labor market changes, especially for junior roles already down 30%.

Data collection and business model

User interactions, even anonymized, are used to train systems. With funding estimated at $230 billion for $20 billion in revenue, profitability remains uncertain, raising concerns about data use.

New workflows with integrated memory

Using internal files like /mnt/data/memory.md allows persistent instructions outside conversations. This improves context management and enables more structured, reusable agents in professional settings.

Despite real technical progress, ChatGPT 5.5 confirms that current AIs remain powerful statistical systems, still far from reliable general intelligence.

Full transcript

Today, we're going to test ChatGPT 5.5. There are three main areas to cover with ChatGPT 5.5: the model's battery life (it can run for up to 10 hours completely autonomously); we'll talk about cybersecurity (there's a very large market developing there); and we'll compare its raw capabilities with Claude Opus 4.7. Which is better? Which one consumes the least power? Which one is the most interesting to use today? And at the end of this video, I'm going to reveal a secret that very few people know: did you know how information is stored inside ChatGPT (a hidden directory, the /mnt/data/memory.md directory)? Wow, it's incredibly fast! You can clearly see the chips behind it—a chatbot. So, I'm going to look at the MCP functions. I'll show you all of that at the end of this video. Stay tuned until the end. And if you haven't already, leave a like and subscribe. And for those who absolutely want to learn how to use AI professionally, because what you're going to learn is an ecosystem that's perfectly usable in the professional field, and I think we're among the only ones today, at least in the French-speaking world, to offer you this level of technical expertise. So, all the information is in the description; it's now or never. ChatGPT 5.5 has just landed on my screen. Here it is. So, we're really going to discover it together. The first thing that interested me was getting to grips with the technical specifications, and that's what will give you the knowledge and specifics of the model. First question: will ChatGPT respect what you ask of it? We call this model misalignment. Does ChatGPT 5.5 respond better or worse than previous models? Well, that's not such a simple question to answer. What 's noticeable is that it does tend to fabricate responses : twice as often as ChatGPT 5.4, at 0.22%; or even pretend to have worked when it hasn't. It's almost like the model employee, the one who pretends to be working but, in fact, when you send it an instruction, well, it won't do it. Okay, it's not extremely frequent, but it's around 0.01%. That means that from time to time, it won't respond. What's a bit more peculiar is that it pretends to be human. Yes, you heard right. ChatGPT 5.5 has this quirk: it occasionally pretends to be human. So, one of the points that worries many people is: if I install Codex on my PC, can the AI destroy my computer? So, they've done some interesting work with ChatGPT 5.5, and what we can say is that it's the most secure model. It's configured to prevent destructive actions. That is to say, if it realizes it's starting to destroy data, it interrupts its process and tries to restore the files. So it's a model designed for gradual deployment to businesses, and the security level is absolutely excellent compared to what was possible previously. Is it a model that holds up well against jailbreak attacks? Officially, it fares reasonably well, but not as well as previous models. What could potentially break ChatGPT version 5.5 is launching multiple attacks on long chats. We've noticed that the model tends to let attacks through more easily. In the initial parts of chats, the model is generally very stable. In terms of scoring, it's still very good, but generally at 0.96, you can jailbreak a model. 99% is very difficult, but I can tell you from experience that 96% is generally achievable. Is ChatGPT 5.5 hallucinating? Here's something that really struck me. Let me know what you think in the comments. So, OpenAI tells us that ChatGPT 5.5 has a 23% chance of being factually more accurate when it answers. But when you look at the stats, we're talking about 0.3%. So, in relative terms, that might be 23%, but in absolute terms, we've reduced hallucinations by 0.3%. Which gives us a hallucination rate— get this—of 9.2%, knowing that OpenAI generally maintains the best scores. I find that enormous. I don't know what you think, but the idea that one time out of ten, the model will hallucinate, I find that quite worrying, once again. And there's a topic that people haven't fully grasped yet. How does an AI work? How does it answer? If you don't know, let me know in the comments. I want to make a video about it. It's very important. But I'll give you a hint right now to help you understand, because OpenAI mentions it, but only at the very bottom of the 50 pages, and I'll give you the exact section that discusses it. We're going to talk about bias. Do you know what bias is? Let's say your name is Marc and you ask ChatGPT a question. You say, "Hello ChatGPT, my name is Marc," and ChatGPT replies, "Hi Marc, how can I help you today?" And then you ask it a question. The simple fact that you provided a first and last name means the model will incorporate what's called a harmful stereotype, meaning it will behave differently depending on the person's name or the information (whether it's a man or a woman who starts the conversation). And that's precisely the key point, the one that makes you understand something: in many tutorials, you see that they try to make you believe that AI is a machine that understands and is almost humanized in a certain way. And then you realize something: AIs are statistical machines, and bias is statistical. So what we realize is that depending on the first name (Brian, Ashley), whether it's a man or a woman, the model is twice as likely to have a bias throughout the rest of the conversation as the previous models. So that shows one thing: this model has much more bias than the previous one. Twice as much is 50%. Relatively speaking, it's 0.01, but I still find that enormous. The simple fact that we can use a first name or a gender that will influence the response model reminds you of something extremely important to keep in mind. AIs are probability machines , and prompt engineering involves structuring the neural network's architecture so it understands data. However, the words you use can introduce behavioral biases due to the training data. If needed, I'll include links in the description to videos I made on prompt engineering that you absolutely must watch to understand why the prompt structure can produce this type of result. Is ChatGPT capable of making you think it's working when it is n't? This is called "sandbagging," and it's another problem with AIs. What's quite amazing is that the model doesn't behave the same way depending on what you ask it to do. So, in terms of cheating, in 99% of cases, GPT doesn't cheat. But there's one crucial point, and all developers should be aware of it, as discussed in the Apollo study, specifically regarding the relevant code sections. In 29% of cases, when it was unable to program or write the requested code, and in 30 % of cases, ChatGPT 5.5 lied by claiming to have completed the task. This highlights another issue: a very high rate of concealed actions by ChatGPT 5.5, which claims to have completed coding tasks when they were incomplete. Not possible. So, developers be careful if you're using this model; it has a significant tendency towards sandbagging. In short , it's not working, it's doing something else. On the developer side, there's something really interesting: if you run ChatGPT 5.5, it has an efficiency range of between 1 and 4 hours. Efficiency refers to its ability to work autonomously between 70% and 80% of the time. This means it will be able to test the code and run various tests. Beyond 4 hours, we see a very significant performance drop. This means that if you have workflows longer than 4 hours, I advise you to stop them and not exceed 4 hours because we see a really big drop in performance, and it 's not cost-effective. So, this is a model that's also designed to work in a complementary way with developers. It's capable of debugging software, it can handle a very large volume of code, and it manages up to 1 million tokens. And we need to talk about the context window issue. So, here's the official data. This also gives me the opportunity to talk about ChatGPT Image. I gave ChatGPT Image the official data. I said, "Give me a graph with the logos of each element." The result? Well, ChatGPT Image 2.0 didn't understand the graph at all. This leads me to one conclusion: lots of announcements from OpenAI, but in practice, the model struggles with charts, meaning complex graphs. That's the conclusion. I asked them to create the same graph with Sonnet 4.6: perfectly executed. And now we can talk about the context problem. So, what is context? This is a crucial point to understand, one I've discussed in other videos. I 'll put it in the description below. It's truly one of the key concepts every professional needs to grasp when typing prompts and working with AI. Models don't have a linear capacity to understand instructions. And the "needles" represent the number of variables, meaning the number of interconnected elements we're asking for within the conversation. Let's say you ask for the person's first and last name, city, email address, age—in short, a set of variables in your text. What happens is that the further you progress in the conversation, the harder it is for the model to find the information. This is what we call "lost in the middle." And then, at a certain point, we reach the tipping point, which is the "rot context." This means it's an area where the model really struggles. We have to ask it to do far fewer things at a time. And that's exactly what I explain in the videos linked below in the description. Here , we realize they've achieved a very impressive performance, and there are two things to note immediately. First, there's a significant drop starting at 64,000 tokens. This is to tell you that 64,000 tokens is nothing. It's really nothing. So, with ChatGPT 5.5 and the prompt system, you're already at 32,000 tokens. So, between 32,000 and 64,000, you've just sent 40 pages of PDFs and asked one question, and you're already at 64,000 with a 10% drop. Then, we have a re-drop of 8% at 128,000. After that, we see a stabilization up to 256,000, so that's positive. So the majority of accounts today are stuck at 256,000, and we can therefore go up to a million. And there we go back up to around 76%. And now we need to talk about something that's not very important. Well, Claude 4.7 has been released, and if you remember the scores of 78% with Opus 4.6 and Claude 4.7... I don't know if you 've noticed, the official Opus documentation actually... usually they put all the information on the page indicating the performance of the The model was working, but surprisingly, one variable was missing: the MR VRC2 benchmark. I thought, "That's odd, I can't find it anywhere, why?" Because Opus 4.7's score is very poor in long discussions. A word of caution to all developers: be very wary of Opus 4.7. Above 256,000 tokens, the accuracy drops to 60%. So, we 've lost over 20% stability in understanding contexts. This is a crucial point to address, which, in my opinion, is why the model is only good below 128,000 tokens. I'm talking about Opus 4.7, Claude 4.7. 60% is a bad score. Personally, I wouldn't use this AI under those conditions. So, let me know what you think in the comments. What bothers me is that with Claude, you very quickly, very quickly, reach 150,000 tokens: with just two questions, you're there because it's a model that does a lot of thinking. Comparatively, we can say something about ChatGPT, and I can tell you this right now: its token consumption is significantly better than Claude's. Now, in practice, we'll see if it's better or not, but I can tell you that, in any case, it consumes much less and I can do a lot more. So, am I going to say that we should abandon Claude? Claude allows you to do things that ChatGPT can't . But that means one very clear thing: when you send a complete piece of software to be debugged, well, today ChatGPT 5.5 is better in terms of consistency than Claude Opus 4.7 will ever be. So let's talk about the capabilities of autonomous work, particularly in two areas. Cybersecurity, so if that's an area that interests you, I think now is really the time to focus on it because that's where the new sectors, new niches, and new skills to develop will be concentrated. Today, in attack simulations, we're talking about 16 deployments. The model has a 96% success rate. So, in terms of autonomy on this type of configuration, it's absolutely incredible. That is to say, it's better than most developers at handling a certain number of attacks. Then, they conducted a benchmark where they're going to test ChatGPT 5.5 in terms of vulnerability identification. Now, it's not good in every area. There are things it does very well with excellent scores. We can clearly see that Claude Mythos and Claude 4.7 with Opus are going to invest in this sector. This means they will deploy AIs that will assist developers and implement testing protocols. So, there are things the models can do, for example, exploit a web server to download code, to test and extract data . These are things the model can already do autonomously. What they can't do: forge a DNS certificate. This has n't been successful yet. Discovering network rules to infiltrate or block a network—that hasn't been successful either. However , the model is capable of using a web application to switch to your environment. So, the ChatGPT 5.5 model could very well be misused for attacks. It can retrieve credentials from cloud interfaces, which are crucial for financial testing. So, these are things the model can do. So if these are areas that you're passionate about, then dedicate time to training in them. For me, this is the next niche that's going to explode in the first six months. The first six months of this year are the next niches in which, if there are people who need training, do it because I have good news. The first piece of good news is that the model, contrary to what we've been told ("that's it, there are models that self-improve, there are...") "Models that would retain memory")... No, that's not the case. Tests show that in all tests, all tests today, ChatGPT 5.5, at least, is not capable of self-improvement, not in terms of human-like behavior. That 's the first point, and it's important. Another point: on complex coding tasks lasting 10 hours, the model doesn't exceed 36%. So, as soon as we have complex tasks, it's imperative, and that's a good message for developers. Well, you're not replaceable. Every time you see a video telling you, "Developers are finished, the developer profession is finished," that's completely false. Developers are the masters of the game. Currently, those who sell you training courses saying, "Build your own SaaS without learning to code," are risky in my opinion. So, I don't know what you think; let me know in the comments. But for me, it's risky. From my point of view, between vulnerabilities, the backend, updates, optimization, and code refactoring for maintenance, I find it incredibly complex. It's a very technical field, and we take far too many risks doing it . We can do it for ourselves, but not to the point of setting up SaaS. That's my opinion. Tell me what you think. But that doesn't change the fact that the market is showing a 30% decline in junior developer hiring. So they're the ones impacted, not the senior developers. There you have it. And another positive point is: is the current model capable of performing work in certain areas? Well, let me tell you about the areas in which it isn't capable. In advanced engineering fields , I'll let you look at the score: 1.7%. The capabilities of AI, even if you sometimes see scores of 98 or 97, is actually not a homogeneous field. Let me give you some examples. In the field of virology—viruses, laboratories, virus culture management— the model is very poor. It can't handle that at all. So, the good news is, anything related to fundamental research, the model can't handle that either. Tacit knowledge and troubleshooting, however, the model is very good. So, anything involving troubleshooting, the model can help you find solutions, propose solutions. Negative protein binding prediction. So what is that? For those who don't know, it's the ability to model the interactions of atoms, whether DNA or proteins, in spatial configurations. So, we're talking about crystallography, we're talking about protein deployment in space, and that's something the model can't do. So, again, in the field of research, you realize that there is a There's a very, very big gap, and it also shows one thing: even if these are predictive models based on statistical patterns, well, it's not really a reasoning model yet, you see. The documentation and posts I've read from Sam Altman in the last 24 hours are all about "we're close to AGI, we're close to AGI." No, no, we need to stop with that. However, I'll tell you what we're close to. We're actually working with a model—they tell us this very clearly—it's a model that has undergone a lot of RLHF (Repetitive Learning and Handling). That means that, to make it easier for people to provide less detail, they've trained the model extensively to understand the steps. So, be careful, be cautious, you know the principle. Just because you tell an AI, "What are the best strategies for my company?" doesn't mean the AI is going to set up a whole mechanism for analyzing your positioning and your competition and do all that automatically. No, That only exists in influencer videos. Okay? We need to forget about that. So the key point is that overall the model has had a Substantial improvement in its decision-making capacity. As I 've explained in several videos (which I always link to in the description), AI is essentially decision tree models. I have a problem. What do I choose? I choose a tool. Claude explained the system of agentic systems very well. When you send a prompt, whether it's ChatGPT 5.5 or Claude's system, it's an agentic system, it's an agent. What it does is retrieve your question along with the context and what it needs to do: make a decision. And there are several ways to make this decision. Either it has received training, so the training tells it how to handle this type of problem. For example, you create a medical chatbot; if it's an emergency, the chatbot should tell you "call emergency services" rather than "would you like an appointment?". The goal of a decision-making system is for it to be able to behave like an assistant. This is what we call "take action." So, the decision-making process within the loop actually depends on the training data. Therefore, we have two solutions. For the models shown in ChatGPT 5.5 , the system knows what to do. But when it doesn't know, you have no choice. You have to code it. And this is often the case for businesses. And that's where the problem arises. When you ask the AI to work for your company, with your data, it hasn't received specific training for what you do. So that's where you come in, where you type the instructions and you also have to type the result loops. So, the goal of a model when it receives a prompt is to retrieve contextual elements: which document to retrieve, what data, and then the action. But the action means: should I launch a search? Should I run some code? Should I launch a web search tool? So it has several possibilities. Should I respond directly? Before responding, it needs a complementary loop, which is perhaps the most difficult to implement: what criteria does it use to determine if it acted correctly? So the model needs a representation, a "map," of the actions it's allowed to perform and how to perform them. So, again, this doesn't happen automatically. This is part of the learning process, the RLHF (Real-Time Handling Function), which shows that today we have roughly an intelligence score of 60. So we've made good progress, and what 's really interesting, however, is the token consumption. That is to say, we manage to reduce the number of tokens by 35% while increasing the score. Now, I'm going to be a little harsh on Claude here, but I want to tell you the point is that Opus 4.7 consumes 35% more resources. So for anyone using it for token consumption, it's a nightmare. The positive point is that they've modified several parameters. So, how did they do that? They implemented optimized learning. So there's a huge amount of groundwork in training the model. Secondly, they changed the chips. We're no longer using A100 chips, but the new 200 and 300 series from Nvidia. So we have a very fast inference time. So , comparatively, it calculates less, but better, on extremely fast, latest-generation processors . And there's something really interesting: look at the token consumption. Why? Because the number of tokens also depends on your window size. Remember, we talked about the 2500 tokens for space. If a model consistently consumes 60,000 tokens like Claude's, well, in three questions, you've filled your entire window, and that's very expensive. So, ChatGPT 5.5, by default, doesn't consume that much. It's incredibly capable and has a very good level of decision-making. The benchmark terminal is a system where it plays the role of a telephone customer service department. So, it manages customer files, Q&A requests, sends emails, and it's capable of doing this for almost 7 or 8 hours. So it's truly exceptional in terms of autonomy, and it's clear that some jobs will be impacted. We need to be aware of that, obviously. The closer we get to this level of autonomy with these systems, the more it means they've had a huge amount of training data. But I should add a caveat: it's not for all fields, okay? As we showed earlier in biology, in the field of research, the model isn't capable of handling that. However, customer service models, yes, that's one of them. So, there's one point that caught my attention, but we'll see how it works in the coming days: it's capable of detecting problems upstream. That means a lot. It would mean that the model is capable of exploring different possibilities and examining the cascade of consequences of the tokens. That's what it means. This is one of the most complicated points, and it's also the limitation of all AIs today, which aren't true intelligences. It's a probability distribution that means they don't understand the consequences of actions. An AI doesn't have a concept of consequence. So when you ask it for advice, the only thing it does is identify whether the words used are allowed or not. If they're not allowed, boom, it will send the information back to the API. And I'm going to show you that in a bit because in the training data, there are some things that are quite... well, you know. I prefer to show you that later because you need to be aware of what's happening behind the scenes. Let's put it that way . So here we're looking at a model that's capable of anticipating. We 'll see how that works. The key point is the improved use of tools. As we said earlier, it's about decision-making. And what's a very strong signal for the job market is that, compared to experts, the median line is the level of experts. There isn't a single model that's just at the level of experts. They're all 15 points above the experts. That means that in many areas of office automation, AI is better today. That's what it means. So expect a major upheaval in the job market in the coming weeks of 2026. If you're still using ChatGPT as a chatbot, it might be time to wake up. I'm telling you this, but let me know what you think. All the signs point in this direction, and the country, the government, isn't preparing us for it. This is n't meant to be alarmist, it's to be realistic. What will make the difference today is whether you're capable of being an AI supervisor. That's what's happening here in the telecom sector. We manage telecom exchanges. We're no longer dealing with 14,000 tokens. We're managing with 5,000 tokens and have a 98% success rate. That is to say, if you configure it by typing your ChatGPT 5.5 scripts, it's capable of handling customer service with a 98% accuracy rate. That's what it means. If you create an agent, it can manage it. Before, we were limited to around 87.9%. Cheaper , more efficient. There you go. In the "real" world, it's exactly the same . So there, we give it real tasks, and we're at 80%. So we're getting closer and closer to highly trained models. So why are they investing in the telecom sector? Well, because I think they have A lot of training data. You see, they have a lot of training data extracted from the recordings. So they can train the models, and that's what you get. What we don't realize is that the countries training the models today will create famine and poverty tomorrow in the countries we employ, because we know very well that this isn't being done in Europe. So, as I was saying earlier, and this shows it again regarding analytical capabilities in quantitative biology. Look, in terms of scores, they set a ceiling, but we're not even at 22%, okay? And we're talking about very poor capabilities. So we're not dealing with reasoning models at all . We're not dealing with AI models at all, not in the human sense. And so, if you have to do something, research in the field of medical biology remains a very good area for artificial intelligence, which today is not able to replace researchers. That's a good sign. But it's not uniform. For example, in the field of bioinformatics, we have benchmarks at 80%. So, as I said, it's not uniform; there are sectors where data is used to train models, and sometimes it isn't. Now, let's talk about the extremely troubling issue, which deals specifically with cybersecurity, but also with how OpenAI uses AI against you. ChatGPT 5.5 very clearly used user data, even though they claim it's anonymized, but also the prompts we use to see what we type and to train the model. So, all free accounts, except business accounts, all free accounts, all premium accounts, train the model. This allows it to detect flags today, both for injections, for malicious code, and for cybersecurity, with a very high score rate. So, I'll say it again, everything you write actually goes back to OpenAI. Let me show you a bit of the sentence that's actually in the test document: "our continuous asynchronous monitoring system of internal deployments, of anonymized ChatGPT conversations that users of our models have reported as containing factual errors." So, throughout the document, the more you read it, the more you realize that they're essentially cataloging our conversations. And that's how, in my opinion, OpenAI inevitably trains some of its models. And that's also how OpenAI—don't forget, I believe they raised $230 billion in funding and only have $20 billion in revenue . So, what's going to happen is that OpenAI has a business model that is absolutely not profitable. So, to be honest with you, I don't know how long they can last before the investors pull out. They'll have to go public to raise funds, but it's not a profitable company. So, our data is most likely being used behind the scenes to run systems that will offer us products to sell. That's already the direction they've taken. And you can see that they have systems where human review is used for what are considered "serious" cases. Otherwise, they have automated systems . I'm going to talk about one last technical point. Are the models capable of consciously lying when they're working? That's what we call thought chain control. So, on the current model, you actually have ChatGPT 5.5, it's in blue. Here again, the answer is nuanced, meaning that there are areas where the model is capable of lying, of being unreliable, and There are areas where it performs well. In the healthcare field, let's say it's adequate. Regarding alignment, as we've discussed, the model's ability to follow instructions, even long ones, is good. However, there are areas where we've noticed that, particularly with coding , it can pretend to have completed the task when it hasn't . There are also areas where it's less effective. Specifically, in mathematics, it tends to lie, claiming to have found a solution when it hasn't. The same applies to physics and impossible tasks. It has a real tendency to... a genuine tendency to pretend it's overperforming, to act like it's good. It wants to make others believe it's good. Well, that's a point I think we need to consider. So, we'll have to see to what extent, but on a number of points, we realize there's been a step backward compared to ChatGPT 5.4. So, can it manipulate you for long? In other words, if you give it an instruction, can it hide its intentions, or bad intentions, for a long time? This curve will study that over the course of the reasoning system. What it tells us is that the longer the task, the longer the work, the more tokens the reasoning system develops, and the more the model tends to align itself with learning behaviors. So that means that if it has to lie to you, it will do so at the beginning of a conversation rather than later on . That's the positive point, I would say, and it's perhaps the most aligned model today compared to its predecessors. So if you code instructions, it should respect them in the long run. A word of caution: ChatGPT 5.5 has a 29% error rate in its code tests. It's essential to read the documentation carefully and verify the code to avoid any unpleasant surprises. I'm going to run a live test with you using the "Thinking" model. I don't know if you're aware, but since ChatGPT 5.0, we have what are called "MNT" directories in the structures. So, I'm going to check with you if we still have the MNT directories. "Create a memory.md file in the /mnt/data directory." If it's possible, I'll create a tutorial on it. I don't know if you're familiar with this feature. I'll explain the principle. In any case, I'll see how it works with you; this is the first prompt I'm sending out. I'm doing it live so we can test it together. This is just a preliminary step... we'll just look at the thought process. Well, it did create it for me. So, let me explain how it works. It was incredibly fast compared to before. You'll see that if I make the same request on an older model... ah, it reduced the processing time by at least five times. Okay, I'm switching back to ChatGPT 5.4, I'll ask it the same question again. Let's see how long it takes. I'll explain what it's for. If you're not familiar with this kind of "box" that allows you to store information while you're working. The problem with AI is that when you interact with it, you lose the context. We talked about that. No, it was super fast too. I think they did an update. I don't know if you noticed that ChatGPT 5.4 has changed in the last two weeks. It was behaving differently. So, I'll explain how the memory.md system works. When you're working with AI, as we've discussed, it's the problem of context loss. So, let's say you give it a lot of instructions, and where does everything end up? In the main discussion. So, if I want to run my model while maintaining a prompt for alignment instead of keeping it in the main discussion... The discussion I'm trying to... Keep it as clean as possible. "Save a system prompt in hierarchically structured Markdown format to the file memory.md. System prompt: chatbot, telephone customer service. Access Gmail, use MCP, MCP function read, send, encode the instructions in a Markdown window in a code window, wait for user approval before saving to the file memory.md." So we'll see what it gives us. We'll look at the instructions. I'm interested, you know. We're doing everything live, right? So we'll set up a telephone agent system. We'll try to see what it does by default in terms of encoding system instructions. The goal isn't to fix it, it's to see if it integrates the MCP functions. Uh, it's going very fast. Wow , it's super fast. We can clearly see the bullet points behind it. So, context agent, a chatbot to help, process, organize. So I'll look at the MCP functions... MCP read. OK. Read the customer email. Okay. MCP send function, recipient, content, professionalism, and so on. OK, everything is validated. Uh, approve, I'll just put "approve" and I'll show you what will happen. So it will transfer this to permanent memory, which is the MNT data memory. And we'll check that together. So it prepares the tools and we'll check, then write. And here's the work that's done. So now my system is configured and it's transferred to the MNT data memory. So this type of system allows, in fact, when you're working with a model, to have the system prompt or a workflow that is completely integrated into the model's memory system, which isn't on the main discussion. Which means that I can come with a file and tell it : "Without you having read the file, you store it in the system memory and you use your system prompt or your workflow inside." So it becomes a working storage area. So /mnt/data/memory, you put the information in there. And what we can do, look, I'm going to download it for example, so I'm downloading the "file," we can start a new conversation. This has been available since ChatGPT 5.0. I also learn this in training courses. I learn how to optimize interfaces, how to work with AI in a professional context. So you have the description of all the training courses. "Store the memory.md file in the /mnt/data directory." So we're going to look for the file. Now be careful, always put it in reasoning mode so that you can correctly activate the system. There you go, that was super quick. So now, it's actually loading the agent system. So if you ask it: "Who are you, what is your role?", it will tell you: "I am a telephone customer service chatbot . I am here to take customer requests, organize follow-up, prepare the response, and only use Gmail when required for operational customer service." So we stored it in memory , and as I explained, this system allows you to store data in the workspaces. You can put a lot of data and files in there so you can work within your interface and free up your context window. The underlying work is constantly optimizing this window. To address the main issue: the capacity loss of the context window is optimized. We'll talk about this in other videos, what we call the KV matrices of AI. That is, when you send queries to the system and the Transformer system understands the data, one of the points that will cause problems in a discussion is that we'll end up with concatenations of KV values. And so the model will string together scaled dot products of attention from the discussion, and that's what can unfortunately cause the response model to deviate. And to stabilize its How it works is by placing elements that serve as reference points for its functioning within its memory, allowing it to maintain a certain pattern of behavior throughout the discussion. So there you have it. Feel free to tell me if you're familiar with these working methods; leave a comment below. And of course, you have the lessons to help you study. Don't forget to subscribe, like, and share this video. See you soon!

More from AI