
Tech • IA • Crypto
La dernière offensive de Google en IA se concentre sur des modèles « agentiques » plus rapides et plus performants comme Gemini 3.5 Flash, transformant le développement logiciel, les workflows d’entreprise et l’interaction humain‑machine.
Google a présenté Gemini 3.5 Flash comme son modèle le plus performant à ce jour, surpassant des versions antérieures comme 3.1 Pro sur de nombreux benchmarks. Il combine haute intelligence et rapidité, permettant des itérations rapides et d’excellentes performances en code, finance et tâches de productivité comme la création de présentations. Il est déjà intégré aux outils développeurs, API et applications grand public, signalant une stratégie de déploiement unifiée.
L’usage de l’IA évolue rapidement des interfaces de chat simples vers des agents persistants et autonomes. Ces agents peuvent fonctionner en continu, exécutant des workflows complexes en plusieurs étapes sur de longues périodes. Des démonstrations incluent des systèmes nécessitant jusqu’à 15 000 invocations de modèle, montrant un saut en endurance et complexité.
La notion de “Flash” inclut désormais à la fois intelligence élevée et haut débit. Avec des vitesses dépassant 200 tokens par seconde, ces modèles permettent des applications en temps réel tout en gérant un raisonnement approfondi. Cet équilibre est crucial pour les cas d’usage en entreprise où latence et précision impactent directement l’expérience utilisateur et les revenus.
Le rebranding de Vertex AI en Gemini Enterprise Agent Platform reflète un virage du secteur. Plutôt que d’intégrer des modèles dans des interfaces simples, les organisations construisent des produits entiers autour d’agents IA. Ces systèmes combinent outils, accès aux données et prise de décision autonome pour gérer le support client, la conformité ou les opérations internes.
Les entreprises adoptent l’IA selon trois schémas principaux: systèmes décisionnels en temps réel (réponses sub‑seconde), agents autonomes de longue durée pour tâches complexes, et systèmes de surveillance à grande échelle. Chaque schéma impose des compromis entre vitesse, intelligence et coût, renforçant le besoin d’un écosystème de modèles varié.
Le développement logiciel s’impose comme domaine clé pour tester l’IA grâce à sa vérifiabilité et sa scalabilité. Les modèles gèrent désormais des tâches allant de scripts simples à des systèmes complets comme des OS. Toutefois, le code réel exige une collaboration interactive, surtout lorsque les besoins sont ambigus ou incomplets.
Deux approches émergent. Le “vibe coding” permet aux non‑experts de créer rapidement avec peu de contraintes, tandis que l’“agentic engineering” vise des systèmes robustes pour la production, avec workflows structurés et fiabilité. Chacune requiert des comportements de modèle, données d’entraînement et méthodes d’évaluation distincts.
Les benchmarks traditionnels deviennent insuffisants face à des tâches longues et multi‑étapes. De nouvelles méthodes simulent des environnements réels, incluant durée et résultats économiques. Le feedback qualitatif — communication et collaboration des modèles — devient aussi crucial.
À mesure que l’intelligence progresse, les limites se déplacent vers l’accès et la gestion des données. De nombreux workflows dépendent d’informations sensibles ou distribuées en temps réel. L’accès sécurisé, les identifiants et l’intégration système deviennent des défis majeurs, notamment dans des secteurs régulés comme la banque.
Avec l’automatisation du code, les contraintes se déplacent vers le choix des problèmes, le design produit et la distribution. Identifier des besoins utilisateurs pertinents et passer à l’échelle devient central. La facilité de création risque d’entraîner une surproduction de produits à faible valeur.
Les interfaces intégreront voix, gestes, images et multimodalité. Les préférences varieront selon les utilisateurs, du vocal conversationnel à la manipulation visuelle. Des interfaces personnalisées et flexibles joueront un rôle clé dans l’adoption.
Les avancées de modèles comme Gemini 3.5 Flash accélèrent le passage d’outils à des agents autonomes, déplaçant les défis du logiciel et de l’entreprise vers le design, l’accès aux données et l’interaction humaine.
[MUSIC PLAYING] LOGAN KILPATRICK: I was looking forward to this conversation to talk about the whole slew of updates and actually hear from folks across different parts of Google. Toshi-- Tulsee. Tulsee Doshi leads our model team inside of DeepMind and is a co-conspirator in launching all these models. Michael leads the Gemini Enterprise Agent platform, formerly known as Vertex AI team, and is really focused on bringing models to our enterprise customers across the board. And Varun leads the Antigravity team. And lots of exciting updates. Antigravity was sort of-- Tulsee, we were talking earlier today-- another conduit to bring together the Google ecosystem beyond just Gemini model. So lots of fun, lots of things to talk about. But maybe, Tulsee, we can actually start with the model and Gemini 3 Flash, our best model ever. TULSEE DOSHI: 3.5 Flash. LOGAN KILPATRICK: 3.5 Flash, the most capable model ever that we've shipped. You want to give us the highlight and your reactions now sort of 24 hours after getting the model out the door? TULSEE DOSHI: Yeah, so I don't know if any of you have had a chance yet to try 3.5 Flash. But as we announced yesterday and as Logan just said, we're really excited about this model because it is our best model yet. If you look at it compared to 3.1 Pro, it really actually does better on most of our benchmarks. But I think where we're really excited about it is actually how much better this model has gotten when you look at tool use, when you look at some of our long-running coding tasks, when you actually just look at a bunch of, actually, more real-world finance tasks, or as a PM, even things like slide deck creation and kind of productivity tasks. This model is really good. And I think it has this really nice combination of performance and speed, which actually comes to bear, I think, when you're actually trying to iterate quickly with the model. And so we're really excited about it. It's an Antigravity. So you can try it there, obviously, in the API and an AI Studio. What's also been really awesome about this model is it's not just amazing from the developer standpoint, which we'll talk more about. But also, it's now in the Gemini app and it's in AI Mode. And so it's actually powering a lot of consumer experiences also through its amazing scaffolding and capabilities there as well. And I think that's been really awesome to see too. LOGAN KILPATRICK: Yeah, I love that. I feel like we sort of-- something that you said this morning when we were talking, Tulsee, really resonated. And, Varun, maybe I'm excited to hear your reaction to this. There was sort of a comment about the tension of building a Flash model as we continue to push the frontier of intelligence. And what sort of Flash means today is actually different than what it meant a year ago, different than even what it meant six months ago. And a lot of this is actually grounded in the use cases in which people-- how people are using the model. The way that they used to use the model was like a chat UI, and they were just simply asking questions. And now it's long-running agentic use cases. And people have a 24/7 agent with Gemini Spark running behind the scenes. And, Varun, I'm super curious from your perspective, A, your reaction to that, but also how you think about finding the right balance in-- and, actually, I think software development's an interesting use case because developers are doing so many different things and it's difficult to encapsulate software development in a single action that a developer takes. VARUN MOHAN: Yeah, I think it's interesting. When we started taking a look at some of the Flash models, they were much more, as you said, AI overviews, AI Mode-style chat use cases, not a ton of thinking, but then able to maybe take a bunch of search results and then create a good answer. I think what we're seeing now is the Flash models are also able to go for a very long time, many, many steps. I think one of the things that I talked about during the keynote was building an operating system. This was 15,000 model invocations that the model went through. I don't think that was the case before. And I think this is a testament to our research team to be able to push these models. In some ways, actually, I've seen cases where the model actually will think for 30, 40 seconds on a Flash model. And in the past, that was just not the case. So I think the level of intelligence, agentic capability, that the model can actually explore the environment over a very long time horizon, it's kind of changed the way we perceive things. And I think one of the exciting things that we wanted to bring to folks externally that we were seeing internally is just the speed, especially on Antigravity. I think Flash is already really fast. I think we published some number that it's like over 200 tokens a second, much faster than the other frontier models. But we wanted to give people an even faster experience. And we think this kind of counteracts the fact that, hey, the model is thinking for a while. The model is calling a lot of tools and all this stuff. So it's kind of an exciting time to see both of these things come together. LOGAN KILPATRICK: Yeah, I love that. Michael, something that I feel like the Cloud team sort of grounding this moment-- obviously, Vertex sort of renaming to Gemini Enterprise Agent Platform is a very intentional decision. I think it sort of represents this change of how people are building what the models. I think, historically, it was take a model, stick it into some product, do some chat-style UI. I think the announcements from Cloud Next-- and I think now coming through IO-- is very focused on, how do you actually build a product around this new era of models? And I'm curious what your sense is as the future of software development feels like you're going to need agents in order to make that happen. And I'm curious how you're thinking about it. MICHAEL GERSTENHABER: Yeah, that's a good question. And on the top of-- topic of the models, what I'm finding is that the use case drives a lot of how we think about these models. It's not a spectrum of intelligence. It's not a spectrum of latency. But it's actually an easy to describe, intentional reason to use a lot of the models. Something like Pro is completely intelligence bound. Something like the modern Flash that we recently released is state of the art on intelligence. When a software engineer writes code, they frequently just assign a Jira ticket to it. It's less and less frequent interacting with your IDE. So you can set it. You can forget it. You get the code back. And in that interaction pattern, the chat UI is really a matter of information retrieval. The agent is going along its merry way, trying to solve a problem. It realizes it needs data that isn't written down, so it has to pop up and ask a human to unblock it by providing that information. And in the enterprise, a lot of information isn't written down. I think in our lives, obviously, we don't store our birthday in a Postgres database. You know what I mean? If it needs to know our birthday, it might just ask us. And so in that interaction product-- pattern, in the long-running pattern, it's more important that chat is used as information retrieval. Whereas, in something like traditional use of Flash, if you're Verizon-- or it's very common to transact a return for merchants using AI these days. That looks like policy application. And that's very dry policy application, sounds like it can take as long as you want. But in reality, you have somebody on the phone trying to figure out if you can transact that return. It needs to happen very quickly. You really have a time budget. So in that, you need to be as smart as possible in 250 milliseconds. You have to be very good at determining whether this is in policy or within 250 milliseconds, and then you lose the customer no matter what. So in that agentic use case, you might have subagents looking up what the policy are, but you're using a Flash in order to transact the return, and you're just willing to take the risk of applying the policy incorrectly. And then on the flashlight side, we have a lot of customers monitoring the entire internet. You can think about-- well, actually, I'm not sure I'm allowed to name them, but the biggest companies in the world, who have internet scale products that have toxic users, need to moderate those things. So for that or AI Overviews on search, you don't know how many customers you're going to have. And so as a business, you have to restrict your budget, even if the budget is very large and you're thinking about infinite scale. So that's, how intelligent can I be over infinite scale? All three of those are agentic in nature, and all of three imply different interaction patterns, but we have models for all three. Does that make sense? That's what I'm saying. LOGAN KILPATRICK: Yeah, no, I love this. And, actually, it's a perfect segue to-- I think even in the future of, quote, unquote, software development, there's a spectrum of things that we're seeing. And I think on one end of the spectrum, you have-- and, actually, Andrey Karpov-- I'm stealing Andrey's words. He had the best framing of this that I heard. On one end of the spectrum, you have this vibe coding use case. And the people who weren't building software before are now building software. And the constraints and the requirements for that audience is actually different than the person who's using these tools to build production systems. It's much more agentic engineering. And, again, the set of constraints, the systems that you need for that use case is different. I think in the model side, it's actually different too. And I'm curious, as you think about this, Tulsee, of how to balance. And whether or not there's trade-offs between those two things, I'm not sure, sort of the vibe coding use case versus the agentic engineering use case. TULSEE DOSHI: Yeah, I think-- so there's a set of foundations that I think are true across all use cases. You need the model to have a certain level of reasoning capability. You need the model to be able to call tools effectively. You need the model to be able to understand, for example, multimodal inputs or be able to generate high-quality correct code. Those are all things that are true, regardless of whether you're talking about the vibe coding use case or whether you're talking about a legacy code base. But I think across these use cases, you do want the model to see these different kinds of use cases when you're thinking about training the model. And you also want to evaluate the model's quality across these types of use cases. Because it is very possible that you could build a model that's amazing for vibe coding web applications because it has a certain level of reasoning. It has a certain level of aesthetic polish in that it can generate. But it actually really falls over when you give it a legacy system because its understanding of that type of context isn't that strong. And so you actually need to build the model to actually have that sort of variety because you actually need different things. So, for example, when you're talking about a code base that is like thousands and thousands and thousands of lines of code, your ability to actually understand that context and use it effectively becomes really important, where that maybe is less important when you're talking about simple prompt to web app. And so I think it's really, for us, about really breaking down these different types of use cases and then actually walking through, OK, what does the model need to be able to make these use cases effective? And then how do we actually take our own experiences from the users who are using Antigravity internally and externally, from users who are using AI Studio, from users who are using Canvas in the Gemini app? And how do we actually bring all of that feedback back as we're training the model? LOGAN KILPATRICK: Yeah, Cora and I talked yesterday a bunch about this-- even, Varun, you made some comment to me when we were chatting, which is something to the effect of, it would be hard to actually build a great coding model without this product flywheel. It really is an integral part of-- and we were talking with Josh this morning about this sort of product model-- TULSEE DOSHI: Harness. LOGAN KILPATRICK: --harness symbiosis, that sort of exists. And I think Antigravity is sort of playing at almost all three of those different levels. And I'm curious your reaction to that symbiosis, if you agree, but also maybe the challenges, actually, of trying to get each stage of those things right. And I think you need all three if you want the flywheel to spin. VARUN MOHAN: Yeah, I think when we started a while back, I think the harnesses were pretty simple in a lot of ways. They were simple in that they-- for the most part, they were bash, maybe a couple file system calls, and then maybe requesting feedback from the user. But they've started to get more complicated. Right now with Antigravity, the model natively can ask to create subagents. Those can run asynchronously. You can run asynchronous tool calls. This is very important when you run things that are long running. Let's say a researcher spins up a training job. You can't just have it be the case where the agent is blocked until the training job completes and can't really do anything in the meantime. That, and maybe while something is running, the user wants to send a message and then override what the model is actually doing. So these are all interesting details that are very specific. And naturally speaking, if all you're doing is pretraining or post-training on lots of data that looks like SWE tasks, like SWE bench-style tasks that are just GitHub PRs, you're not going to get those level of detail. And it's very important that we actually train the model to be capable of that and then, also, the people that are building the models to see how the current versions of the models perform with different harnesses. So that symbiosis is very much felt at this point. So the exact product that people are consuming outside is something that researchers inside are benefiting from, struggling with, and they're actively trying to optimize. So it's both-- I guess on the research side, we're actually doing reinforcement learning on the harness. But secondarily, also, people are able to feel the performance. So now we never get into a state where, let's say, someone on research says, we have an awesome model. The benchmarks are awesome. But then we can put it in the product and just immediately say, oh, this is not very good. We thought that this behavior was very good. But actually, in reality, when we play around with it, it's much worse. So that feedback loop is ridiculously important, both on the vibe side, as you would say, and then the real harness, training loop side. TULSEE DOSHI: Yeah, there's definitely an empathy-building exercise that I think has been really effective here. LOGAN KILPATRICK: Yeah, one of the interesting threads-- and, Michael, I'm curious, I think some of the tension of this moment that we're in is that, every six months, the models have shifted so much that the way you actually have to go about building products and building with the products and using the products, has to change. And one of the things we were talking earlier this morning-- and I'm curious your reaction to this, Michael-- is just about the model tends to keep eating certain things. And the scaffolding layer, the harness layer, is maybe one example of this. In the future, it might do a lot of these things. It might be more baked into the model itself. And I'm curious if you just-- actually, we have probably a bunch of folks who are building with our models. If you have advice or suggestions how to stay on that frontier, how to make sure that sort of you're making progress with the model and not trying to work against it as progress inevitably continues. MICHAEL GERSTENHABER: Yeah, there's obviously a tension between what the models are capable of at any one point and trying to predict where they're going to go and building a product that won't deprecate because you do usually want to build a product for yourself or others now. It's hard to wait for that. But at the same time, like I say, there is this natural tension. So the conversation over the past-- call it since last February-- has been all about agent building. And then more and more, what the labs are doing and what we're doing is we're providing an agent API, an agent in the CLI. We are providing the agent, and we are offering everybody else to provide it with tools, and context, and a goal. And by training the harness behavior into the model, it's probably better than what most people can accomplish if they build their own agent. And so we did-- so the conversation has changed dramatically, even in the last month or two. It went from, help me build an agent to, help me use your agent for my purpose. So both of those exist, and I think they will continue to exist. And we'll have to deal with that. I do think that a lot of the metaphors about treating the model or model's harness together as a coworker is pretty valid. And so thinking through, how do I build something that fails today because the models aren't smart enough? Instead of, how can I build something that fails today because I built it wrong? That's an enormous difference. And if you're trying to interact with the model, and the model is sort of behaving like a poorly performing employee that you hired incorrectly, that's one thing. But we should be thinking in terms of that future term, where we can pair with the model, whether it's on Slack, whether we call the model, we meet with the model, whether we email with the model. These are just interactions. It should be the same harness. It should be the same model underneath, even if it gets replicated. And then we should think through where-- how do we want to be using it? And assuming that you three will get us there with a better model. LOGAN KILPATRICK: I love that. We can always have better models. MICHAEL GERSTENHABER: And that's going to happen. I feel like we've been all doing this for quite some time. And we're seeing the models get better faster than ever. You know what I mean? I think we can safely bet on that happening. So we should take it as axiomatic and build for that world and fail until the model lets us succeed. LOGAN KILPATRICK: Yeah, one of the questions actually, Tulsee, just for this-- part of the tension, I think, of the software engineering story-- and I think, Varun, you mentioned something to this point as well-- is it becomes increasingly difficult for the actual evals and the benchmarks that exist in the ecosystem to actually approximate what users are doing for these really, really complicated use cases. And I'm curious, just in the arc of where we are for coding and software development in general, how you think that isn't going to end up playing out. Is it just like the way that we measure how great models are at doing all these tasks is going to require an actual product in order to do that? Or you can-- I feel like a lot of benchmarks these days are almost approximating products themselves. You have to build a lot of infrastructure in order to do a SWE bench these days or one of these other benchmarks. I'm curious. TULSEE DOSHI: Yeah, I do think-- you're already seeing that, to your point, in how our evals are changing. Our evals are approximating more product experiences. They need to be inherently more complex if they want to replicate inherently complex tasks. A fun external benchmark of this, I guess, would be something like vending bench, which is literally simulating the ability for a model to run a functional store and earn profit. And that's the kind of example of a very complex benchmark that actually measures how the model is able to do this over the span of days. But I also think that one thing that we're realizing-- and this goes to Varun's point about building empathy for the vibes and really actually understanding real-world engineering use cases-- is we're also leaning a lot more into even just live experiments and actual side-by-side feedback. And I think you're going to need both of these things. I don't think we can just rely on one or the other. We're going to need our benchmarks to get better and more complex, and then we're going to need them to represent a diversity of use cases. But we're also going to need to just keep putting the model in the hands of users because, actually, one of the things that I think really has come out to me from a lot of the feedback we're getting from our developers, internally and externally, is that a lot of the feedback is actually a lot more qualitative and subjective, even in the coding space. I think we often think about coding as, oh, it's very verifiable. You can just-- it's either right or it's wrong. And either you're getting something correct at the end or you're not. But, actually, a lot of the feedback we get-- and Varun should speak more about this too-- but is a lot of the behavior of the model, how the model actually talks to you while you're coding. LOGAN KILPATRICK: Yeah. TULSEE DOSHI: The efficiency of the model. And so the laziness of the model or the overeagerness of the model, and these are all things that you can suss out from benchmarks. But the actual impact they have on a regular developer's experience, I think you really only see in the live experiment and from the feedback. And I think that has become a really, really important part of the flywheel too. MICHAEL GERSTENHABER: One thing that I'm seeing, anyway, is that the way people use it-- and I assume that the benchmarks are looking-- is not multiturn prefill, decode, prefill, decode. It's actually like the model will have to write its own software, run that software, get a return value, and then interleave that with sampling, so that it gets more language out and continues to iterate with a workstation, with a computer, in the cloud. And so it's not really-- it's not even-- TULSEE DOSHI: It's not as linear. MICHAEL GERSTENHABER: It's not as linear anymore. That's right. LOGAN KILPATRICK: Varun, one of the interesting things is I feel like agentic coding is probably the first use case, where people really spend a very long time with the model interacting. I think it's actually an interesting wedge to figure out a bunch of the other things about the model from a personality standpoint, how eager it is. How does it boast? Is it underappreciating the hard work that it's doing? And I'm curious if you've sort of felt that as we've done all these LEs and Antigravity, and as the product continues to develop. Also, how to get people familiar with that because I feel like it's not a muscle that we have. So many AI use cases today, you ask something. You get an answer back. You go on with the rest of your day. You ask for an image. You get an image. You go on with the rest of your day. It's not deeply multiturn. And I feel like agentic coding is probably the most notable exception to that. VARUN MOHAN: Yeah, so I guess the reason why coding as a whole was picked was because of what Tulsee said in the beginning. It's very verifiable from one perspective, like a lot of other fields of knowledge work that are in front of computers. It's not that easy to verify this in the same way. And also, it's very long. So you can actually-- it's a great problem if you want to test the capability of your model. You can go arbitrarily long. You can go from, I guess, writing Fibonacci to then building an operating system to building a browser to-- I don't know-- building Google, technically. There's a scale of what you could possibly do. I think the reason why people see this is because this is the classic thing. I think for software engineers, actually, they do three things. The first thing they do is they decide what to build. They figure out how to build it, and then they build it. And just the process of deciding what to build and how to build it is a very sort of interactive process and something you'll need to work with the model to do, right? And I think at this point, we can all say, if you have a perfect vision of what you need to do, these models have all gotten to the point where they can just perfectly execute on it. If you say, hey, I want you to execute this unit test in this place with this capability, we can do that amazingly. And the complexity comes from when there are details that are blurred and not filled in. And that requires some interaction between the user and the model. So you get a feel of, hey, what does it mean for a model to be well behaved with a user? And this is why this field has become such a hot area to improve behavior and stuff like that. LOGAN KILPATRICK: Yeah, I love that. One of the interesting things to see play out-- and actually, Tulsee, I think you mentioned this. Or actually, no, Michael, you mentioned this. The direction of travel is the models are getting so good at coding. I think it's a reasonable bet to assume the trend line continues. Assuming you keep working nights and weekends, Tulsee, you and Varun will get better models. They'll keep getting better. 18 months from now, when we've gotten to the point that the models are actually incredibly good, coding is mostly solved, as much as it can be, what's your sense of, where does the bottleneck move? Because the bottleneck moves somewhere else. Obviously, if the actual-- Varun, to your point, the three stages of what people building software go through today, if the last stage of actually executing on the vision and the idea and how you want to solve the problem is solved, I'm curious for all three of your perspective about, where do the bottlenecks end up in that world, as the ability to create software happens very fast? TULSEE DOSHI: I think you're basically going to push the bottlenecks out in both directions of the actual building. So I think on one, you can almost say, you're limited, then, by your ideas. You're limited by the part about, what do I want to build? And, actually, I think, then, the interesting opportunity for everyone in this room and for all of us is like, it's almost like taste And, actually, really identifying, what are the problems that are really important to solve? Because when building is so easy, it's very easy to then build for problems that maybe even aren't real problems. And I liked how Josh said this earlier today, really find the pain points for users. LOGAN KILPATRICK: Yeah. TULSEE DOSHI: And I think your limitation then is like, are you finding a real pain point? And are you actually able to solve for that? But then you also have the problem on the other side. And you're going to have bottlenecks on that side too, which is, just because you've built a product, do you have the capacity to scale that product? And do you have the ability to scale that to the set of users you want to scale? How do you reach those users? How do you design the product in a way that it is understandable, and Groq-able and easy to use? And I think you're going to see bottlenecks on both of those. And so it's going to be more and more important that we start building that kind of muscle of not only, how do you design an amazing product? But then how do you actually deploy that product? LOGAN KILPATRICK: I love that. MICHAEL GERSTENHABER: I think there's a very clear bottleneck today in information retrieval. And in the future, software development will get easier and easier and easier and easier. And at one point, maybe that won't even matter. But any process bogs down. Any blocking node in any process is usually because there is some sort of authority that needs to be given, or information that needs to be retrieved in order to continue the process. And that's definitely true. Less true in getting a product to production, but I don't think it's that controversial to say it's much easier to build a coding agent than an SRE agent. An SRE agent has to continually query the state of an etcd cluster, and make a next step and a plan based on the most recent state of the etcd cluster, of a Kubernetes cluster, rather. But in software, usually, all information that you need is already written down, and you're given a task, and it can run for a long time without hitting a blocker like that. So there's plenty in software development that we're going to get over by solving these information retrieval bottlenecks. And there's probably plenty in the SDLC, if not in the authorship of code that, already, the models are intelligent enough for. And that IR problem is already the bottleneck, like testing software, like deploying software and keeping it up and keeping it secure. And you might need access to a database in order to do a unit test with real data, stuff like that. So I think there's this very real IR problem that the agent harness contemplates that is different than intelligence, which the model contemplates. And by solving both together, we can get a very long way. LOGAN KILPATRICK: Yeah, just to double click on this, I'm curious. Obviously, developers have been doing RAG and sort of some amount of information retrieval-- MICHAEL GERSTENHABER: But that's all comparable data, right? Again, that's not interacting with a customer to make sure that you can transact a return in a real situation or a bank's KYC process. There are two banks that I know that are running their Know Your Customer process with the models, which means that not only do I have to look up a very wealthy individual to determine whether or not they sit on a board with somebody from a banned country, and they might be a money laundering risk, but I actually know who that person is. I've run a background check. The bank has run a background check on them. And that's very sensitive data. Nobody's-- not nobody. Actually, these two banks are credentialing their agents to that very sensitive data. But they're only doing it now because they know that data won't be exfiltrated. Before that, the model was intelligent enough. One bank did this in May, the other, the next October. So quite a bit of ways. That's not because the model got smart enough. That's because the bank figured out how to solve the IR problem with the right level of credentialing to run the process. You see what I mean? LOGAN KILPATRICK: Yeah, no, that makes sense. I feel like that is where a lot of the frontier challenge is actually moving to, is how to-- MICHAEL GERSTENHABER: And the models can write good SQL. They can even plan and say, I need information I don't have instead of going ahead and hallucinating. But that's a software problem and not a model problem in many ways. LOGAN KILPATRICK: Yeah. TULSEE DOSHI: It's also-- it's a software problem. It's also, in many of these cases, like a safety and security. There's a number of these other layers to this that are a part of this problem. MICHAEL GERSTENHABER: 100%. So those are bottlenecks that I see that are coming up. LOGAN KILPATRICK: Varun? VARUN MOHAN: Yeah, I was just going to say, I think people somehow believe, with AI, we're just going to get products that already exist or going to get way more-- there's going to be way more stuff in them. I actually believe that it's very hard to even decide what to build. If I take any product that is good today, I can't just add a hundred buttons and be like, oh, this button does X, this button does Y, and expect my users to get the sum of all of those capabilities of a button. I know a lot of products that look like this-- don't want to call them out-- some enterprise products that look like this. But I think that's just a tough problem to solve. So you're going to have-- I think the people that are going to shine are the people that are going to be incredibly high agency. They can make hard calls. These are very tough, tough to make. Users cannot take in hundreds of things. And that's going to be what's in very short supply. And that's what, I think, builders should really focus on. And that's going to be the hardest skill that we're all going to be trying to build. What should we be building is the hard problem. And I think that's always been the hard problem. It's never been-- I think a lot of crazy stuff has been built at Google. I think the hard question is not, how does this piece of infrastructure, this crazy piece of infrastructure, work? We have Spanner. We have all this crazy stuff in Google Cloud. It's like, what should I actually be building is the hard problem. LOGAN KILPATRICK: Yeah, it's a great question. What should people be building? I want to-- I have a couple more questions that I'll ask you. And I want to make sure we get time for audience Q&A. So if you have questions for our lovely panelists, please queue them up. And then somebody somewhere will have a microphone for you to ask the question. One of the other ones that's very top of mind is sort of, as the way we interact with agents continues to change, I'm curious, this interface layer matters a lot. Varun, you showed sort of talking to the model, which we had this with Google Docs yesterday and being able to talk to the model. It feels like audio is obviously one of those paradigms. Other thing-- maybe it's just audio, but other ideas or comments, just sort of about how that interface layer is actually changing and how it is actually tied, in a lot of sense, to not only the model capability powering it, but how you can rewrite the interface because-- TULSEE DOSHI: How you redesign. LOGAN KILPATRICK: Redesign for this new era, yeah. MICHAEL GERSTENHABER: And I think it contemplates both quite carefully. If we redesign for the model to interrupt us to unblock it, it has to have a sense of savoir-faire. It has to know, if it pings us and we were too busy to respond, should it be persistent? Should it ping us three more times? Or should it wait an hour and then ask us the same question? Or should it just-- this is not important to this person. Do you know what I mean? And that interaction is going to be very tricky to get right because nobody wants-- everybody's gotten called a thousand times from a very persistent salesperson or something that's very annoying. And we don't want-- if we move into a fully agentic world, where our agents are coworkers and we expect them to have agency and good judgments, then when should they ask us for help, I think, is a very careful question that we'll have to contemplate. LOGAN KILPATRICK: Yeah, that's a good push. VARUN MOHAN: I guess I really believe audio is going to be very big. It's clear we haven't completely cracked it. Obviously, I do use live voice transcription very quickly. It's in Antigravity. It's something I use. I also use it on other apps as well, but it feels like it could be so much more powerful. Imagine a world in which you have a hundred agents kind of running. I don't even know which agent is which. I want to be able to talk to my machine and for it to tell me tell me, hey, here's what's going on across all of your agents that are running. We're not quite there yet, but I'm sure that's in the works. We'll figure that out. TULSEE DOSHI: I think it's also interesting. This goes to a debate we were having this morning about whether we think that the paradigm of building products is going to lead to thousands and thousands of products, or whether it's going to lean down to one product. And I think the question that I think is going to be really interesting in all of this is, do we-- in these two examples, there's this implicit assumption that we'll get to a world where there will be almost an interface-- maybe you talk to it with voice. Maybe you do something else-- that then can spawn off a bunch of agents that can actually go do a number of things for you and come back or proactively reach out. And so there's this kind of interesting shift, then, that would happen from a world where you have 20 different apps to do 20 different things for you, where you know you're going to app A for X and then app B for Y, to a world where maybe you actually have a much simpler view of the world, where you're actually saying, OK, I trust this interface. I'm going to ask this interface a bunch of different things that I need to get done, and I'm going to trust that it can actually find the right product solution for me. And that is a pretty big shift in design principle and a pretty big shift in how you would approach product making, I think. MICHAEL GERSTENHABER: And not to lean too heavily on the metaphor, but, again, this is how humans work. A human is a person, whether they're calling you-- the same person, I should say-- whether they're choosing to call you and their interface is a phone. Or they want to chat you, and they're being very precise, and you can read every letter. Or they want to email you or-- there are many ways that we interact with each other, and we have good judgment for when to do it. A call is always interruptive. I'm not going to call Sundar on the telephone and expect him to pick up, right? So, again, the agent is the person, is the product. The interface should be chosen by the agent to be appropriate for the interaction. LOGAN KILPATRICK: Yeah, all right, my last question, then we'll go to audience Q&A. If you could wave the magic wand and sort of have a new sort of model capability or a product capability, anything on y'all's sort of personal wish list? And I'd actually love to hear the answer to this from the audience as well, maybe after the session. But, yeah, anything for you all that you would love to wave the magic wand and be able to fix or have? TULSEE DOSHI: I can tell you one that is very random and probably not that helpful to a wide range of people. But I'm a dancer, and one of the things that I've been trying to do for a long time is actually leverage our models to actually help me with choreography. And it's actually been harder than I thought, even the parts that I thought would actually be easy to automate. Part of that is something very simple, which is that, in order to upload a video and have the model be able to do useful things with it, the model has to be able to consume a very fast set of frames per second because dancing moves very, very quickly. And so there's literally this actual physical constraint on what the model is actually able to even understand and consume to then be able to actually even give you actionable insights. And so it's one of those things that actually is not just a model problem. It's also a system problem. And then there's actually a model problem. And creativity in that format of something like choreography is also very hard. How do you help the model understand lyricality, for example, which is a very specific thing that you would want to build. And so it's one of these interesting things that I've been thinking about in the context of omni and in the context of multimodal understanding, both of which are things that Gemini is very good at, but actually is difficult in something where you get to one of these more creative spaces. LOGAN KILPATRICK: Two quick plugs, we have Dynamic FPS and the Gemini API. So if you want to toggle the FPS for things like sports or whatever, you're analyzing your golf swing, analyzing sort of dancing, it does-- TULSEE DOSHI: It is very effective. LOGAN KILPATRICK: It does exist. It's very effective. You should also build choreography bench. I like this. It's a good-- the new-- TULSEE DOSHI: New plan. LOGAN KILPATRICK: The new hottest AI benchmark choreography bench. So I'm excited to see it, Tulsee. Michael? MICHAEL GERSTENHABER: Mine is so much more boring than that. I feel like-- but thematically, similar in some ways. I would like-- I talked earlier about the difficulty of building an SRE agent. I worked in my-- a couple jobs ago at a company called Datadog. And I've been thinking a lot about our customer who gets woken up-- and I'm sure everybody here has had this experience-- at 3 o'clock in the morning. And they're not the only ones in their house that wake up. Their spouse wakes up. Their dog wakes up. Their baby starts crying. Their spouse hates them because they're a lawyer, and they have a big client meeting in the morning. Everything goes to mess. And they do this for long enough, they get divorced. You know what I mean? I think about this software engineer who's on call all the time. And I would very much like an agent who can staunch most problems most of the time, who can maintain a distributed system of arbitrary scale, something the size of a Capital One or an Airbnb. You know what I mean? And I think some of the limitations to that is this ability to fill the context window in real-time at high frames per second, using a metaphor, is to push information as it streams in about a ludicrously complicated system and spend most of your time reasoning about that, not writing queries to get the data necessary. It's less of a query in response and more of a streaming mechanism, I think? TULSEE DOSHI: Yeah. MICHAEL GERSTENHABER: But either way, I'm very excited for that world because it's entirely possible, and we're getting there, and a lot of people are working on it. So-- LOGAN KILPATRICK: I love it. Varun? VARUN MOHAN: Yeah, nothing that complicated. I just wanted a personal trainer. TULSEE DOSHI: Yeah. VARUN MOHAN: I feel like I'm kind of lazy, so just that kind of watches what I'm doing and tells me, you shouldn't eat that or you should go run outside. TULSEE DOSHI: See, that's dangerous. VARUN MOHAN: Wait. TULSEE DOSHI: I'd rather not know. LOGAN KILPATRICK: Well, maybe we'll have that with glasses in the fold. VARUN MOHAN: Yeah, with glasses and-- LOGAN KILPATRICK: Powered by-- VARUN MOHAN: --Fitbit. LOGAN KILPATRICK: --Gemini Live. And Fitbit, exactly. Very exciting. Awesome. VARUN MOHAN: Almost there. LOGAN KILPATRICK: I would love to hear some questions from the audience. I have no idea where the microphone is, if it already exists and is in somebody's hand. SPEAKER 1: Yeah, we have two microphones on both sides of the audience. You can line up back here, and we have about eight minutes for questions. LOGAN KILPATRICK: I love it. If you have questions, please feel free to go to the microphone. You have to-- we're going to make you do the old-fashioned walk to the microphone. MICHAEL GERSTENHABER: And please queue up. Feel free to-- LOGAN KILPATRICK: We'll start over here. AUDIENCE: My question's for you, Varun. So you were saying that, in the future, you really-- how you have audio interactions with these AI agents and stuff, what do you think, in the future, developer-AI agent interaction is going to look like? Is it going to be more visual? Or is it going to be more audio based? What's your intuition on that in the upcoming years? VARUN MOHAN: Yeah, I think we've been talking a lot about that on the team. It's weird. What is the best way to input information? I think voice is fast. Technically, you can do some crazy stuff. You could do voice, your hands, and your feet. So you could start tapping with your feet, technically, just to get the fastest amount of input in. I don't know if people would want to do that, but-- probably not. LOGAN KILPATRICK: Sign me up. Sign me up. That sounds fun. VARUN MOHAN: Maybe soon. Maybe soon. TULSEE DOSHI: Like a dance activity. LOGAN KILPATRICK: Yeah. VARUN MOHAN: So I think voice in sounds the most compelling from a data perspective. I think a lot of people are lazy to read, but reading is way faster than listening. So in some ways, I want-- the reason why I brought up the thing of, hey, you can spin up as many agents as possible, it's very clear that's where things are going. The agents are going for longer and longer. You're probably going to have many agents running for you in the background. And being able to quickly talk to it regardless of surface seems like the right play. I think there's maybe a little bit of awkwardness that, if you work in an office, you don't want everyone just talking and yelling at the same time. So we need to figure that part out. And then maybe consuming it, I don't exactly know what the best way to consume it is. Maybe it actually is text. I don't exactly know. One crazy thing about the way you could input stuff is-- I think we were talking in the office. We were saying, people that are "Smash" players, people that play "Smash," they're really fast at moving things. I think they're potentially as fast as you talk. So maybe there's also a new form factor to start inputting data into these machines. I don't really know. I think someone should do some analysis on this. AUDIENCE: Yeah, thank you. Second thing was-- I wanted to ask is that-- do you think gesture's going to play like a big role in spinning up agents or orchestrating how you have your Google Watch, let's say, the Pixel Watch or the Apple Watch? You just spin up an agent on your Apple Watch or Google Watch with just hand gestures. Is it going to be possible in the future? VARUN MOHAN: That would be cool. AUDIENCE: Yeah. VARUN MOHAN: Yeah, that sounds awesome. You should do that. TULSEE DOSHI: I know. That sounds awesome. AUDIENCE: Thank you so much. AUDIENCE: Hey, so I have a question. So we have the text prompt and voice prompt. But what about the people that are more visual, though, or have the lack of communication skills? So how the AI will overcome those barriers that people-- with other people, we can use hand gestures and everything else, that we could communicate. But our text or our voice, we don't have that kind of skills. So how the AI will overcome those barriers? LOGAN KILPATRICK: Good question. I think-- I feel like video and image is an interesting-- I think one of the cool things is if you could draw something out, even if you could express it physically, back to this dance example, I do think the model being able to natively understand image and audio-- and, actually, we were talking earlier today about subtly picking up on the intonation of your voice as you try to express something, even if it's not in the most perfect way, are all things that the model can actually do natively, which I think is super interesting. And, yeah, maybe we need new form factors. And maybe it's gesture or-- MICHAEL GERSTENHABER: Even with our existing form factors. I was working with somebody at work the other day. And no matter how well articulated you are, it's much harder to describe where on the screen to move your cursor than to just fucking point at it. You know what I mean? And I think a lot of input will look much more like tapping the screen or lassoing something or just gesturing at what you mean. And that'll be the kind of interaction. TULSEE DOSHI: It also probably depends, depending on the use case, and also your personal style too. MICHAEL GERSTENHABER: Yeah. TULSEE DOSHI: For me, I'm someone who very much thinks while talking. And I also find myself to be a very verbal communicator, generally. So I prefer that form factor. I prefer audio, right? But for someone like my sister, I actually think she would do a lot better with a form factor, where she had the time to write something out or draw it or move things. And she's a much more visual communicator and a visual learner. And so that would be a much better form factor for her. And so I think, actually, part of what we need to build and design is interfaces that can be flexible to who you are and how you communicate and how you engage. And then there's a lot more of that. There's been a lot of talk for a long time about personalized learning, but I think there's also probably personalized creating. And there's both of those that are effective, I think. LOGAN KILPATRICK: Yeah, my fun fact is my new 20% project at Google is I actually ghostwrite Tulsee's tweets now. And so she dictates to me, and I write them down because she's very audio and I can do the writing. TULSEE DOSHI: So, everyone, just stay tuned. It's going to be great. LOGAN KILPATRICK: And so sometimes, actually-- VARUN MOHAN: All of your tweets are just Gemini now. TULSEE DOSHI: Yeah. LOGAN KILPATRICK: The tongue-in-cheek comment is like, actually, sometimes, it's another human in the loop. AI systems, obviously, are super helpful, but sometimes you actually can get a friend or somebody that you work with to go and-- TULSEE DOSHI: How do you come by Logan? LOGAN KILPATRICK: --and help. Thank you for the question. AUDIENCE: Thank you. SPEAKER 1: I think we have time for about one to two more questions. AUDIENCE: All right, so one of my biggest bottlenecks is speed, right? But the penalty for writing bad code is sometimes really big. So, generally, you want to run the best model. Kind of, where do you feel we are on that kind of scale for the-- we have a really fast model that's good enough for 99.9% of the time, and then we just optimize for speed, and then we just have one big model to just reality check things at the end. VARUN MOHAN: I think we genuinely believe in this idea. Honestly, one of the things that we have in Antigravity that's pretty awesome is we have subagents. But, actually, the model, the main agent, is able to actually pick the model for the subagent. So I think we could totally see a world, especially given we're in a world where we don't have unbounded number of chips, where a big model is able to delegate to smaller models. I think right now, we're not at the stage right now where we're purely optimizing for, how do we save everything rather than capability? Because it's kind of like what you said. Probably, in a lot of workloads, you don't just only want to run the smallest model possible. You would rather have it be the case that the hit rate is a lot higher than have a small model look at it. And then every once in a while, you're like, what's going on? But 100%, this is the direction of travel. I think we will see a world in which bigger models are delegating to smaller models to do a vast majority of work. AUDIENCE: Thank you. LOGAN KILPATRICK: Yeah, it'll save a lot of tokens too. And our last question. AUDIENCE: Hey, guys, so my name is Karthik. So regarding the tests and how we develop our softwares, is there any framework or steps that you guys recommend for using-- to avoid AI slop? So, for example, if I build an entirely new product and then if I want to add a new feature, I don't want to break the existing features. Is there any ways to avoid all of that? Is it just unit tests and integration tests? Or is that more that I'm missing the whole picture? VARUN MOHAN: I think right now, we actually think about this pretty deeply when we're building Antigravity. How do we make Antigravity build itself? And I think you need to be a little mindful at the beginning of the project. So, for instance, let's say you're building an application. I would actually make it so that, from the very beginning of the application, the agent itself can spin up the application, click buttons on the apps, and you actually have these kinds of smooth integration tests. And you can imagine-- SPEAKER 2: [INAUDIBLE] on the dialogue stage. VARUN MOHAN: That's fun. SPEAKER 2: [INAUDIBLE] on craft and creativity. [LAUGHTER] VARUN MOHAN: Sounds like I'm in the restroom of an airport. [LAUGHTER] No, so I think the idea is try to make it so that the agent itself is testing, testing your app. And you can convert this into a pretty high-quality test. Imagine doing that, generating playwright of exactly what it clicked and then afterwards offline doing that, right? So you can convert these genetic tests into more robust tests afterwards. So I think there is a new paradigm now that agents can test things very quickly. You should be leveraging that maximally. AUDIENCE: What about existing products? What about existing products? Say if I have-- oh, sorry. VARUN MOHAN: Yeah, I think you'll need to rearchitect pieces of existing products. That takes a lot of work, I would say. It's better when it's 0 to 1. LOGAN KILPATRICK: I think-- and, actually, just to echo one quick point, I think this goes back to the conversation from before about this agentic engineering discipline being different than vibe coding. And I think the framework that you have for agentic engineering looks different than I think if you're just like yoloing your own personal project, where there's not users and you don't need to worry about anything breaking. I think the frameworks actually look different. I think we as an ecosystem need to actually have that materialize in how we build and talk about this stuff. So thank you for that question. Varun, Michael, Tulsee, thank you for sitting down and doing this. This was an awesome conversation. And please join me in giving them a round of applause. And thank you. Thank everybody. TULSEE DOSHI: Yeah, thanks, everyone. [MUSIC PLAYING]