ENFR
8news

Tech • IA • Crypto

TodayMy briefingVideosTop articles 24hArchivesFavoritesMy topics

Reflecting on a year of Claude Code

AnthropicClaudeJune 8, 2026 at 04:31 PM18:06
Audio player
0:00 / 0:00

TL;DR

AI coding agents are rapidly evolving from simple assistants into autonomous, self-improving systems that are reshaping how software teams operate and collaborate.

KEY POINTS

From novelty to autonomous systems

Early versions of AI coding tools handled only simple engineering tasks and drew limited internal excitement. Within roughly a year, they have evolved into systems capable of orchestrating “trees” of thousands of agents working simultaneously. Developers now routinely deploy multiple agents that generate, refine, and execute code with minimal supervision, marking a sharp shift from single-assistant workflows.

Self-improving agents through persistent learning

A key breakthrough lies in teaching agents to improve themselves rather than correcting them manually. Instead of issuing repeated instructions, developers encode fixes into reusable “skills” or documentation, allowing the system to adapt over time. This enables agents to run continuously with compounding improvements, reducing the need for repeated human intervention.

Verification redefined beyond tests

Traditional checks like unit tests, linting, and type validation are no longer sufficient. Modern verification focuses on whether an agent can successfully execute tasks in real environments, such as running applications or interacting with interfaces. Systems now simulate real usage—across desktop, mobile, and command-line environments—where agents test, debug, and redeploy code autonomously.

Rise of automated routines and background agents

Engineers are increasingly deploying persistent routines that monitor workflows and act proactively. Examples include agents that scan bug reports, generate fixes, and submit pull requests without human prompting. In some cases, issues are resolved before the original developer can respond, demonstrating a shift toward continuous, automated maintenance.

Auto mode replaces manual oversight

New “auto mode” systems eliminate the need for constant human approval of agent actions. Instead of reviewing every command, a secondary model evaluates safety and blocks suspicious operations. This approach has proven more effective than manual review, as humans tend to approve the vast majority of requests and miss subtle risks.

Security through adversarial testing

Robust security now depends on extensive internal testing, including simulated attacks and prompt injection attempts. Thousands of agent interaction logs are analyzed to train systems to detect unsafe behavior. Dedicated red-teaming and adversarial evaluations help ensure agents can resist both known and novel attack strategies.

Blurring of roles across organizations

AI coding tools are dissolving traditional boundaries between engineers, designers, product managers, and even finance teams. Non-engineers increasingly write and modify code directly, while engineers take on broader product responsibilities. This convergence reflects a shift toward end-to-end ownership enabled by automation.

AI at the center of business processes

Companies seeing the greatest gains are restructuring workflows around AI rather than treating it as an add-on. Instead of layering AI onto existing processes, organizations are placing it at the core of operations—from coding and reviews to onboarding and internal knowledge access—mirroring the historical shift from paper-based systems to computers.

Minimalist approach to prompts and context

As models improve, heavy prompt and context engineering are becoming less necessary. Developers are moving toward minimal instructions, allowing models to retrieve relevant context themselves. Excessive guidance is increasingly viewed as counterproductive, potentially limiting model performance.

Shift to multi-agent and mobile workflows

Engineers now manage fleets of agents from a single interface, often controlling them remotely via mobile devices. Tasks can be initiated, monitored, and adjusted from anywhere, reflecting a move toward asynchronous, distributed development environments where human oversight is intermittent rather than constant.

CONCLUSION

AI coding agents are transitioning from tools into autonomous collaborators, driving a fundamental reorganization of software development and accelerating the convergence of technical and non-technical roles.

Full transcript

When we first released Quad Code, it was like a little video and I remember posting it to Slack and there was like two people that gave like the reaction like people were like excited. I >> thought it was really cool, especially for my very easy engineering tasks. It was quite good at it. >> That's like a really nice way to say that it wasn't really good. I can't believe it's only been a year since we first launched Quad Code. >> It's hard to remember what what that was like. Like it is it's so different than what we're doing today. Like now I just have like armies of agents that are doing stuff like I'm prompting one agent or I have like an agent that's like prompting agents that's prompting agents and it's like a tree of like thousands of agents. But is I think it's just like the most important idea when working on this stuff is like every single time Quad makes a mistake, I don't tell Quad to do it differently. I tell it to write it to the quadmd or to like make a skill or or something to do it differently. And if you can do this, then quad can just like run forever. And I I think the other thing that we kind of realized is the verification is is really important. Like we didn't realize that. >> I hear this come up a lot with developers and enterprises that we meet with. Um what are your tips for making a really good making quad code really good at verification? I sort of feel like this is this thing that just like everyone misunderstands because whenever we talk about verification, people are thinking like unit test or they're thinking like lint or like type check. These are the things that are obviously really easy to automate and these are the things that were already automated. But actually when we talk about verification for agents, it's something slightly different. It's like can the agent run the thing? It takes a little bit of uh mental work to figure out how exactly do you do this cuz it's often not straightforward. And I think that's like that that's one of the challenges. I remember with uh with Opus 4, Claude tested itself and we we we just like hooked it up to Opus 4 and I was like, "Claude, build a feature and then test yourself in like bash and it opened a little claude CLI and tested its own feature and I was just like whoa, it's crazy." Like now now we're so used to it. Like now, you know, now now we have these loops going for, you know, like the iOS simulator and the Android simulator and like computers for desktop. like it's it's not surprising, but back then that was crazy. How how are like how how are you doing it? >> So, I've been mainly hacking on the desktop app these days and one of the engineers on the team actually added this desktop development skill that teaches Claude how to run the local desktop app and I've been having it use it and it still runs into issues or like bugs with the staging environment sometimes. And so what I have it do is in those cases I have it read slack and understand hey is is staging down right now or is there has someone else already hit this? Um and then when it debugs the whole issue I tell it to update the desktop development skill. What the skill does is cloud actually spins up a local desktop app and it uses computer use to quick around it. And so when I add a new UX it quicks around to invoke the new UX. It also tests edge cases and when there's an issue it fix it. This says it and rechecks. >> This is like honestly one of my favorite things about this team is everyone codes. I I I have never been on a team where like like the my my PM would code and it's like crazy and like your code is like really good. Like >> here's your noise. But I I also just feel like it's it's also just becoming easier because it's like essentially like Claude writes the code. And so what matters a little more is like what what's the idea that you have and I I feel like if you're a person that has like the product context and the business context and you're thinking about the design and the user, you're just going to come up with better ideas. >> It's kind of like all the roles are merging. >> I remember seeing Megan or designers PRs and I was just horrified at the beginning. I was like, "Oh my god, why is Megan putting up PRs?" And then she was like, "Yeah, yeah, I'm just like I'm fixing the button." And I was like, "Okay, all right. Well, the code looks good, so maybe it's maybe it's fine." And I I feel like now it's just like it's totally normal. >> Yeah. We see this across all the enterprises we talk with. Like it's the engineers adopt cloud code first >> and then the the adjacent roles look over their shoulder and they're like, "Wa, this thing is very powerful. Let me try it out." And we found, it's crazy. We found that like our designers are more productive making prototypes and making changes directly in the app instead of paying an engineer. PMS are making changes in the app. Like our finance team runs in cloud code. They do their projections there. >> Um data science um like if you talk with our data scientists, it's so cool. It's just like everyone just has cloud codes up on their screens. Yeah. >> Um I feel like it's it's remarkably versatile for for different roles. What do you feel like nowadays are like the use cases that are that are pushing the limit? >> One that I'm super excited about is routines. There's one engineer on our team who launched voice mode across all of our products and um he has this routine set up that just listens for every ticket that comes every GitHub issue, every bug report about voice mode and his claw just picks it up proactively puts up a fix and then pings the PR to him. And when he got that working for voice mode, he thought, "Okay, we're getting a lot of other feedback that isn't being responded to." So, uh, he also set up a routine to listen for that. So, I shipped this, uh, small feature and there was like an edge case in it that I didn't see. And so, someone filed a bug for it and I was going to get it get to the bug that night. And as my quad was working, it said, "Wait a second, another quad has already fixed this." And I was like, "How's this possible?" Like, I've never talked to him about this feature before. And so I pinged him and I was like, "How did you fix this so quickly?" And he said he has another routine that just looks for bug reports that haven't been responded to in 5 hours and puts up a fix and he merges the ones that are easy to verify. >> Quad tells me this like all the time now >> that someone else has already fixed it. >> There's always like another person's quad that's working on it. It's like >> Yeah, that's been one of the changes. I I feel like we're um a while ago we were trying to figure out like how to use routines. And I feel like just like the agent SDK was this first idea that we could use quad code programmatically, but I feel like at the beginning it just wasn't obvious how do we use it? What do we use it for? And I I think routines are the first really obvious application. And um I don't know like it it just does like all the code review. It it babysits like every PR. You know, remember back in the day you used to actually have to like respond to code review comments. You used to have to like fix CI. You used to have to rebase. >> Yeah. >> Like I I haven't done that in a long time. >> Yeah. When you're in the CLI and you're synchronously working with Quad, what are the your go-to features? >> Okay. What they used to be is plan mode. I don't use that anymore. I use >> What do you use instead? >> Auto mode. >> Auto mode. >> It's the best. >> Instead of plan mode. >> Instead of plan mode. Yeah. cuz I the newer models they don't actually need like a planning step anymore. I think this was really important for like Opus 4 through maybe 4.5 then I think starting with 46 and definitely with 47 it just doesn't need that planning step. I think some people still use it. They like to have that artifact. I I don't use it and I just do auto mode for everything because then I I start my quad, it starts to work and then I just like move on to the next cloud and I don't have to sit there and watch it. But from the very early stage we had this like permission prompt model for quad code, right? like it runs a tool and then it asks you like hey are you okay running this tool and you had to say yes or no and at the time that was kind of the best we had a year and a half ago cuz we didn't have you know classifiers the model was not as well aligned as it is today. So auto mode was just such a it was such a big step up cuz actually you don't want to read most of these requests just routing it to a different model and having it check for security works so much better. >> Yeah. And if a thing like is a little sus or you know this isn't a command that you think you want to run or it's not safe the model will just deny it and then you can go back and you can allow it later. I think this has been one of those like step changes. We we just there's no way we could have done this a year and a half ago. >> It's just human nature when you accept 99% of requests that your eyes just glaze over when you read it. And so actually we feel that auto mode is more safe than reading every single permission prompt because it means that you're only paying attention to the most important thing and not like being spammed a bunch of things that are just 99% yes. I think security is one of these things like you can talk about it and then it's a totally different thing to actually do it correctly because it just doesn't always look the way that you think it's going to look and it's just all about like always red teaming always pentesting always looking you know always having a threat model and then using that to figure out you know how is this thing going to get attacked how are people going to get prompt injected >> exactly >> and I I just feel like like the team is just like obsessed with this and it it's so important because as a result I just trust the agent to run and I can move on and I can just have like a second agent. And if I didn't trust it, then I just wouldn't have been able to do that. >> And internally, um, to actually get auto mode out to our users, we needed to really trust it first. And so what we did was we collected thousands of transcripts of like an entire agent trajectory and a permission prompt and had automood classify whether or not it was safe. And it was extremely good at this. So then we got red teamers and we asked them to try to prompt inject and try to hack uh the codebase and we used this to create evals and made sure that all of these were denied and then we had our own internal teams try to prompt inject and hack cloud code uh cloud codes auto mode and then we improved auto mode to make sure that we caught all of these. So, it's not only just protecting you against the vulnerabilities that are out there in the wild today, but the the most intelligent attacks that we can construct. >> Yeah. I mean, it's like it's honestly like a weird approach. I I I feel like there's like all these features the last year where the first time someone pitched it, I was like, "Ah, no, no way. That's not going to work." And I feel like over time I just learned like I'm actually wrong like so often now because like building on the model is so weird. Yeah, >> it's just like all this like engineering stuff that I've learned over the years, like so much of it I just have to like throw out. And this is just like part of what the job is now. Like we're building on a new thing and we just have to relearn it. And I automotive was definitely one of these. I was like the first time I heard it, I was like route the prompt to a model. No way. That's not going to work. And then it actually turns out empirically it works really really well. >> I heard you also love Loop. >> Um I Yeah, I I love Loop. >> How do you use it? I think for Lube there's this transition that we went through like a year and a half ago where we were like all right there's source code but actually the thing an engineer should interact with maybe it's not the source code maybe it's the agent and so we made this leap of like I don't write the source code I talk to an agent and the agent writes the source code for me and I think right now what's happening is we're making the next leap I don't talk to an agent anymore I talk to a loop or I talk to a routine and it prompts quad for me and it's just it's crazy. I mean, it's been like it's a year and a half and this is like two big leaves. >> If you take like a step back, how are you seeing entire engineering orgs change? >> I'm going to put I'm going to put on my business cat hat. I I I have this like favorite case study. This is like a Harvard Business Review from the '90s and they were talking about like computers are here. Why are we not seeing the productivity benefits? And it's just this like amazing snapshot into like what it actually felt like at the time because like you know people used to use mainframes at some point companies switched to personal computers. It was sort of a new thing and companies were trying to figure out how to how to use it the same way they're trying to figure out how to use AI right now. And it turned out that to get the productivity benefits from computers. What you had to do isn't like you have your paper filing cabinet and your like paper and pen business process and then there's like a computer on the side that does something. Actually, what you have to do is you throw out the filing cabinet. You have to throw out all your paper and all your pens and then you put a computer in the center and everything has to run through the computer. It has to be at the center of every business process. And I I feel like at anthropic we do this thing where when you on board, you don't ask people questions. Like no one askked me questions when they on board. You probably have the same thing. They ask Quad. And this is kind of weird. Like this is the first company I've been at like that. And I feel like for us, Quad is just at the center of everything. Whenever I have a question, I ask Quad. Whenever I write code, I use Quad. Whenever I need a code review, Quad does it. Uh, whenever I need a security review, Quad does it. Whenever I need to, you know, fill out a form or something, Co-Work does it. So, it's just like Quad is at the center of everything. And I I feel like the companies that are really figuring it out, and there's a bunch of them now. They're just putting Quad at the center of it. >> But I think for computers, the transition took 10 to 15 years. But actually for AI because so much of our work is already ready digitized and Claude can use a computer and it can write code and run code this transition is happening a lot faster. >> I think it's just like really it's just really exciting. Like I feel like now I don't have to bug people anymore. And when I interact with people it's because it's like fun and I get to collaborate with them on stuff and we get to create something together. It's not that like I need them. I need something, you know, from them cuz like Quad can actually do a lot of that stuff now. And I also feel like as an engineer, I've just never had this much fun doing engineering cuz the like the tedious part I don't have to do. Like I'm just coming up with ideas. I'm talking to customers >> and every idea I like I don't have a to-do list anymore. Like cloud just builds everything. And so my job is to come up with these ideas and it's just so fun. Okay, so here's a question. Is the future product or engineering? Like is everyone going to be a PM or is everyone going to be an engineer? Everyone's going to be both. I I feel pretty strongly that these roles are merging. Like when we look at our team, our product team all writes code, our devro team all writes code, our design team all writes code and then we look at our engineers and a lot of them ship products end to end. They have an idea for what to build. They build it. They work with legal and marketing to figure out how we communicate this to the world and make sure it's safe and with security too. And a lot of times they just see through this whole process end to end. So I think right now AI really benefits people who have a lot of curiosity, have a lot of product taste, who love to have this like endto-end ownership and now a lot of people are running like hundreds of agents. What are the products that you think people should be adopting as they transition from single to multiple to hundreds? Until recently, the way that I wrote code was I had like six terminal tabs with six git checkouts of the same repo and then I would just like tab between them. Now it's pretty different. I have like one tab. I use the new agent view that we just shipped. It's like so good. And I'm so glad that we took a while to iterate on it to make that really good. And I also use the desktop app because I don't have to fiddle with checkouts that way. It just like, you know, it it does the work tree cloning like it it creates the work trees for me. And the thing that I would not have expected 6 months ago is probably half my engineering now. I do on my phone. So I just have like I have so many agents running that I just start from my phone. I use a remote control which is like amazing now. And like I'll start something on my computer and then I'll just remote control in from my phone and I'll just like walk around. I'll like get coffee and then I'll check in on my agents and maybe I'll start another agent. And sometimes I'm like talking to someone and we come up with a new idea. I'll just start an agent on the spot. I'll like talk to it with voice mode and just have it build something and I don't even have to go back to my computer anymore. >> I remember when you started doing this because you would actually leave work, have your computer on your desk open, plugged in, screen locked, and I just thought you would like come back to the office at some point to get your computer, but then we like pretty late. And I was like, hm, maybe you just like left it here by accident. And then it happened again the next day. happened again the next day and I was like, "Wait, this is so weird because you're landing PR so your computer is right next to me and I remember you responding and you're like, "Yeah, I'm coding from my couch." >> Yeah, that was the week that remote control got really good. >> Yeah. So, another thing that users are asking about all the time is how do you do context engineering, especially in a large enterprise? >> This is a thing, you know, people used to talk about prompt engineering, they used to talk about context engineering. This is sort of matching where the model was at the time. Back in the days of Sonnet 3.5, you had to prompt engineer. Back in the days of Opus 4, you had to context engineer. But with the models of today, you don't do any of this. You give it the minimal possible system prompt, the minimal possible tools, and then you let the model figure it out. Like you just have to give the model some way to pull in the context. I think that's the most important thing. How do you think about it? >> I see things very similarly. I'm a context minimalist. So my general philosophy is tell the model only what it needs to know and let it figure out the rest of it. Um I think when you give the model too much context, it's kind of like you're micromanaging it. And sometimes the model knows a better way to get to the same outcome. And I personally prefer to give the model that freedom to do that. Um and then in general, we're also making our harness more lean so that you have more room for your own prompts. Um, and so that follows your promise better. >> There's all these different ways to use Quad now. But I feel like in a year it's going to be a totally new set of things and it's going to be so surprising if it's still these same things cuz I I think like we're seeing these giant trends happening right now. Agents are running for longer. They're more autonomous. Very rarely am I running one agent at a time. It's usually like a few agents or dozens or hundreds or thousands. And so like the form factor for that, it's going to be really different than what came before. And I don't know what it's going to be. And I I think in a large part it's going to be up to the team to figure it out. And this is um this is why I'm like so happy we run the team that the way that we do where everyone just comes up with ideas and everyone is able to think about the product. Everyone talks to users all the time because I don't think these ideas are going to come from us. It's going to come from the team. >> Totally. And from everyone in our community building with us.

More from Anthropic