
Tech • IA • Crypto
Google Cloud's Race Condition is an open-source, large-scale agentic simulation framework demonstrating advanced multi-agent AI capabilities through a marathon planning simulation in 3D Las Vegas, integrating new AI platform services and gaming-inspired design patterns.
Race Condition Simulation Framework
Race Condition is a large-scale, agentic simulation framework designed to showcase Google Cloud’s Gemini Enterprise Agent Platform capabilities. It features a 3D recreation of Las Vegas, where agents collaboratively plan a marathon considering traffic, city regulations, and other factors. This project was demonstrated during the Google Cloud Next developer keynote and is open-source, enabling developers to explore agent communication, AI orchestration, and simulation at scale.
Multi-Agent Architecture with LLM-Driven Agents
The system integrates numerous agents powered by large language models (LLMs), using the Agent Development Kit (ADK). Each agent combines a model, instructions, and tools (APIs/integrations) to perform specific tasks. For example, planner agents strategize race logistics, while simulator agents run the event simulation. The system is designed to enable agents to discover, coordinate, and communicate with each other in a decentralized, scalable manner.
Back-End Technologies and Communication
The agents communicate via the Agent-to-Agent (A2A) protocol, which allows agents to advertise skills and negotiate tasks. Communication is handled over Google Cloud Pub/Sub as a message bus to support low-latency, high-throughput messaging. Google Cloud Memory Store (a managed Redis) supports system performance. The back end runs on Google Kubernetes Engine (GKE), auto-scaling Gemini 4 open-weight models via the vLLM serving framework for inference.
Gaming-Inspired Architecture
The project draws on gaming design patterns, such as placing the authoritative simulation state on the server side akin to multiplayer games. The front end acts mostly as a “dumb client” displaying state streamed from the server, with limited local event processing (e.g., collision detection for hydration stations). The continuous simulation loop mirrors a game loop, taking discrete time “ticks” to update and sample agent states.
Balancing Model Use and Performance at Scale
To achieve sub-second response times for thousands of simulated runners, the system uses a hybrid approach: many runner agents run deterministic logic without LLM inference to reduce latency, while a smaller set of runners powered by Gemini 4 models have internal thoughts adding variability. About 100 runner sessions share a single GPU instance running Gemini 4, managed dynamically by GKE autoscaling to balance throughput and cost.
Agent Identity and Security
The platform incorporates a shared responsibility security model with agent-level identity to control permissions. Using agent gateways and Model Context Protocols (MCP), administrators can tightly restrict what tools and APIs agents can access. This layered security reduces risks from issues like hallucination or prompt injection attacks, critical because agents can interact with real enterprise data and services.
Token and Context Window Management
Managing LLM token consumption at scale is a core challenge. Techniques include limiting context size by avoiding unnecessary history in specific sub-agents, compacting conversation history, and using task-specific agents to reduce token overhead. These context management strategies enable real-time responsiveness critical for simulation and interactive experiences.
Developer Engagement and Open Source
Google encourages developers to fork the Race Condition codebase and experiment with agent configurations, especially enhancing runner behaviors with Gemini 4 models. The project includes multiple demos and shows how to build with the A2A protocol and agent registry. Free Google Cloud credits accompany the open-source release to promote hands-on development.
Creative Process and Evolution
The concept evolved from considering cities like Paris to ultimately choosing Las Vegas for the simulation, leveraging its iconic Strip landmarks. Initial ideas ranged from a virtual alien space station to a city-based marathon simulation. The team drew inspiration from prior Google virtual conference projects and classic games like SimCity, with elements reminiscent of Minecraft’s client-server authoritative game state design.
Interesting Anecdotes
A surprising local law prohibits camels on the road in Las Vegas. The team considered adding camels as runner characters but had to drop the idea due to time constraints. The attempt highlights their blend of realistic constraints with playful ideas.
User Experience and Visualization
The interactive front end provides live updates from the simulation using direct Pub/Sub streams, giving users a real-time view of agent states and events every tick rather than waiting for simulation completion. The 3D environment is built with custom models and visual elements, not AI-generated, to ensure quality and realism.
Insights from Developers’ Backgrounds
The collaboration blended a games-focused front-end developer with a back-end distributed systems engineer. Experience with game loops, real-time state updates, and multiplayer state authority strongly influenced architecture decisions. The game developer emphasized how AI can unlock new gameplay mechanics beyond procedural content generation.
Future Prospects and Expansion
The team envisions increasing realism by adding behavioral complexity such as runners cheating or “tripping” each other, modeling business impact along the marathon route, and handling public safety constraints dynamically. Expanding agents’ capabilities and inter-agent dynamics presents vast opportunities for research and demonstration.
Industry and Product Implications
Race Condition serves as a tangible example of how enterprise solutions can evolve from microservices into interconnected reusable agents with shared protocols. It highlights how gaming design principles and modern AI pipelines converge in real-world systems requiring scale, security, and usability.
Personal Notes
The developers shared personal gaming and climbing interests influencing their perspectives. One highlighted the joy of challenging game design (from their title Duet) and the immersive potential of combining sound and interaction. Both stressed the importance of partnership among back-end, front-end, and domain experts to deliver compelling AI-powered demos.
Google’s Race Condition project thus acts as a pioneering showcase of deploying scalable autonomous agent ecosystems in cloud environments, with lessons for development, security, design patterns, and human-computer interaction. Its open-source availability empowers developers to build next-generation AI-driven simulations and applications.
[music] >> Hi everyone. We're here for a Q&A session about race condition, which is a high scale agentic simulation framework for planning a marathon in Las Vegas that we've built for the developer keynote at Cloud Next. I'm here with Casey West. And I'm Tom Greenaway. Yeah, so we work together to build the simulation software itself, which is open source, and we have also produced the developer solution. But we thought it might be interesting to have a little bit of a discussion about how it got built, why we built it, some of the considerations that we had to put into it. Yeah. Yeah, exactly. I oversaw the the front end development of the project, and Casey took care of the the back end and oversaw that. Maybe this is a good opportunity to introduce ourselves a little bit. Tom, what's your background and like what put you in a good position to work on the front end? Yeah, so before Google, I was an independent game developer. I made many mobile games and I've made many games in general my life. And in the in Google, I was the the web games lead for the Chrome developer relations team. So I've always focused on games in my career, which is great for like a 3D kind of experience like race condition. And my career is mostly about back end software development, large scale distributed systems and architectures. So as we were thinking about building a reference architecture for large scale distributed system and especially an agentic one, you know, my background was good for working on the back end. So it's always nice to be able to come together with engineers that have different backgrounds, different levels of expertise. Yeah, absolutely. Um Yeah, I'm going to like we have some prepared questions. >> We both have Qs and we both have As, hopefully. Yeah, exactly. So I'll ask you, Casey, tell everyone about race condition. Yeah, so we mentioned a little bit, but race condition is our name for the reference architecture that we put together that runs the very real simulation that hopefully you've seen on the developer keynote at Next. It's on YouTube. We we can put a link in the description of this video. But the idea was to build a large scale agentic system, a real one, that could demonstrate a lot of the capabilities of some of our new platform offerings. So Google Cloud, of course, is launching a bunch of products at at Next. By the time you see this video, that will have happened. And a large number of those products that we're really focusing on are around what we're calling Gemini Enterprise Agent Platform, which is a collection of platform services, but also open source standards, frameworks, and protocols that we put together to help you build large scale agentic systems. So we thought let's build one and show it to you. Yeah, and you're actually going to be at Cloud Next in the developer keynote with a bunch of other Googlers showing off the the demo. And it's pretty cool because it's you know, on the front end, like on the client, it's this 3D recreation of Las Vegas, has like the monuments of the the Las Vegas strip. And it connects to this like back end agentic sort of server system that's communicating with all these different agents. Maybe do you want to tell us a bit more about how the back end communication works? And actually I have a a question here, which is what is an agent exactly? >> [laughter] >> Yeah, great question. So I think um at its most basic level, I'll answer that question first cuz that's that's the stage quite well. At its most basic level, an agent I think is comprised of three components. So we have a model in our world these days, that's usually a large language model for producing some sort of text or generated text. That text could be anything from code to narrative. Um you have instructions for the model to in- influence it and tell it what type of job it has, what role it has, what the guidelines are. And then most importantly, I think you have tools, which is the integrations. If you imagine you know, anyone doing a job, the best way to do that job is to have the right tools for the job. And so the model needs to have good instructions and good tools. You put all those things together and you have an agent. Um in our case, we wanted to envision what it would look like to build a large scale architecture and what sort of patterns would emerge when you take something like simulations, which is a pretty tried and true piece of software architecture. Like building simulations is something we've done for decades. But what happens when it's agentic? And in our case, what we learned is that we need ways for the agents to collaborate with one another, discover each other and what their capabilities are. We can coordinate the work. And um when we have a lot of agents, when we have a lot of sessions, and in our case, we're simulating a marathon with potentially thousands of runners. Um managing those communication pathways, especially to the front end, is really difficult. So we had to come up with some new patterns that tend to model enterprise solutions pretty well. Yeah, because like and you know, developers will be able to explore the code and the developer solution we've created as well, which showcases how we've got these sort of different demos that use different configurations of agents. And in those demos, they can load it up and in the 3D front end, they can see Las Vegas, but they can ask an agent back end, like a kind of planner agent, to help them plan a marathon in the city of Las Vegas taking into account all the different sort of stuff that one would need to take into account when planning a marathon in the city. And that's kind of like what we wanted to recreate or simulate. Um So yeah, it's like it's come together really really well. I think it's hopefully it's going to be exciting at the actual show. So I have another question for you. Although maybe well, I'll do my question and you can ask me one. Sure. What are some of the AI design patterns that exist that enterprise developers can extract from the simulation? Yeah, I think one of the big things is how do we get agents to communicate with one another? And and one of the things we're working on at Google and in Google Cloud are trying to have standards around those communication patterns. So we have an agent development kit, which is our primary framework for building agents, but then you still need the agents to communicate with one another. We're focused on agent to agent as a protocol, which is essentially a standard that it that allows agents to advertise their skills and abilities and and what they're up to. We use agent, what we call an agent card for that. Um And then you can use anything from HTTP and HTTP streaming to communicate, but also in our case, we actually used essentially a message bus, which is a again a tried and true enterprise solution. We use a message bus to pass large numbers of messages with low latency within the system. Is that with Pub/Sub? Yes, in this case we used Pub/Sub and we used Google Cloud Memory Store, which is a managed high scale or high availability Redis solution that we have at Google. Nice. Okay. Want to ask me a question? >> Yeah, well so thinking of design patterns actually and your your focus on gaming in your career, I'm curious what sort of gaming design patterns we that you would want to highlight that we integrated into the system or we we thought about using. Yeah, that's a good question. I mean there's just like a lot of stuff in the user experience that we tried to to have, but there's also in in the the back end and the like the client and the back end kind of design, there was the decision of where the like authority of the the server was with sorry, the authority of the state of the game was held. So that's always a tricky one with like multiplayer games. So in our case, we put the game loop or the majority of the game loop and that authority of the state on the server. Obviously it's not a multiplayer game, but it's it's a simulation with many agents, so there's some kind of parallels there. So the client is you know, we could say like a dumb client, like a kind of it's really just sort of listening to what's happening on the server. However, there are certain events which we did decide made more sense in the client. For example, when it's a bit more like precise and sort of related to the 3D positions of the objects, like the runners run along the marathon track and we have these kind of like water stations which use like a sort of 3D kind of sphere to indicate that they're in this zone. When they collide with that, then it triggers like the sort of water refresh like replenishment event, which you know, goes back up to the server and that effectively like communicates I guess to an agent that is representing or connecting with the runners and it just gets a message like I have drunk some water, right? That's right. Yeah, so the the back end in this case we're using ADK, an ADK agent that we call simulator, that runs simulation. And ADK is our framework where you can have LLM powered agents, like I described the core of an agent. You've got models, tools, and instructions. But you can also have deterministic workflows as part of the that agent architecture. So we use a looping agent to loop through a number of sampling events. So when you think about doing a simulation, you want to take samples of data as the simulation occurs. but that models very closely to what we call like a game loop in in games, right? Um so, the server is essentially running simulation and advertising state changes constantly to the front end. But, um what's nice about the agentic workflow and the idea that you have multiple agents that can all advertise themselves and you can communicate with them independently is that the front end can also send events. Um what do you call What do you call those in in gaming? Oh, like an interrupt or something like that. Yeah, I mean if it's coming from the like a different um client uh to like the main state authority. Um but uh the other interesting thing I think to highlight I guess and this was something I didn't realize at the beginning when like we started to collaborate on this project and the vision for um how the whole agentic back end would work is like um you know, you usually in projects you define some very clear API endpoints, right? For the client to be calling to the server, but actually the way we sort of designed this was to demonstrate how everything can really just be um like uh almost like a chat message um for the agents to interpret, right? That's right. Yeah, the idea with agents especially powered by LLMs is that you don't have to be as rigid as you would be in typical software systems where you have to connect APIs. The APIs are fixed. There are reference points and and we still want to have well-defined APIs and software architectures, but with agents um what we found is you know, you can ask the planner to run the simulation using natural language. You could also ask it by sending a JSON payload that essentially represents the same thing. What's nice is that either way um the agent will interpret it and generally do the right thing if it's if it has the right instructions. It's funny actually, I just looked at my next question and it's basically what we were just saying, but it adds adds on to it how steep is the learning curve to start building with the A2A protocol and the agent registry. Yeah, um it's not too bad. So, uh one of the nice things on Google Cloud Platform and especially with Gemini Enterprise Agent Platform is that when you deploy your agents to Agent Runtime, which is where um you'll typically want to deploy and run your agents from, they get automatically uh integrated into the agent registry and they get an agent card generated for them. Um that essentially is the combination of the A2A protocol plus the agent runtime working together. And so, the agent registry allows us to have um a list of agents that are available within our environment and any software system could query that agent registry to understand what agents are available, including other agents. Um I mean, I haven't actually dived into the agent registry stuff that much, but like I'm also working on another project at the moment where I'm building some anti-gravity skills and like the skills kind of need to explain as well like what they're good at or what where they should be used. So, I I feel like there's a similarity there. Am I right or Yeah, I think um agent cards actually uh can expose the agent's skills as part of the standard. So, the A2A protocol has that concept, but skills are um basically or skills are definitely a new uh system that everyone is adopting across the board within agentic applications. Um they're very popular and it means a lot of different things. This is always the thing in in software is that we use one term and it can mean a lot of different things. So, in your case, you're building skills for a an AI or an agentic engineering harness like uh anti-gravity. Cloud Code uses skills, Gemini CLI uses skills. And so, as a developer when you're trying to build something with AI or AI assistance, your skills can help guide the model to know how to build the software the way you want it, right? Is that how you're using it? Yeah, yeah. Um that's uh that project will actually be revealed a bit later in the year, so But but then on the on the on the cloud side or on the back end system side, we're also integrating skills into running agents and autonomous agents. Yeah. So, those agents can have skills and that skill is a way to bundle both instructions and tools that can be reused and repurposed across multiple agents. Cool. Yeah. Um so, we were speaking of of games quite a bit Yeah. and I think something that that's really fascinating to me. I'm always looking for the connections or the the patterns that get repeated. So, you know, we have this client and server system within gaming and I'm not a game developer and and I don't game a huge amount, but I see a lot of parallels to business applications that we've been building for the last few years. We have microservices, which are distributed systems. Those are evolving into agents, which are distributed systems because we have multiple agents that can be deployed independently and integrated with one another. And then client systems, we have things like data bindings and Angular as a framework um that can speak to the server and when it gets state updates, it'll automatically refresh the UI. It feels like it's a lot of parallels to the way games work. The game that I'm most familiar with, the one that I I actually do play is Minecraft and I know I'm not alone. Very popular game. Um but it's it's the same thing. There's a client and a server. There's a a tick system or like an update system on the server side. It sounds very similar. What do you think? Do you think that I'm on the right track? Yeah, yeah. I I Funnily enough, I have not played that much of Minecraft and I mean I played when it first became like it started to become popular and I I bought it like the I think the Java version at the beginning. Is that Minecraft Classic now? Is that what it's called? I don't know what it's called, uh but I know that that's what I play. I play the Java version. Um yeah. Yeah, but like I don't actually know a lot about its like architecture to be honest, but I I guess you do run up your own server, right? And then your friends can connect to it. So, yeah. So, so probably the server is like the authoritative stage, right? And I mean, what's the like how many users can it handle usually? Uh that depends. So, um there are a lot of plugins and it's a rich ecosystem uh for for Minecraft. So, it depends on how much you optimize your server, but you could potentially have hundreds or thousands of of uh connections. Um there's a lot of nuance there uh that we probably don't need to get into in this in this chat. Um but I do think that uh it's always interesting to know how like some of the things that we do for fun can influence the decisions we make when we're building like software at work. And you know, so for me, I was able to take some inspiration in the back end architecture from my um somewhat limited understanding of how Minecraft works, but um how about for you? What sort of games do you play that might have influenced how the simulation played out? Um I don't play that many games nowadays myself, um but I I you know, I played lots of games when I was young um and you know, I've got some very specific favorites, but you know, SimCity obviously uh was a game I played um back in the '90s and you know, when we came up with the idea for this project, um I mean it actually took um there was a bit of an evolution actually really because um funnily enough uh like I think we we said earlier that you know, you're from the cloud uh cloud dev rel. Um I'm actually from a team called Builders dev rel um and now we're working together as as teams um and I had worked on actually at Google um something called Adventure, which was a virtual conference technology we built during um the the pandemic and we used it for multiple Google I/Os and other um Google events uh and that had like full multiplayer with characters, you know, um representing people um recreating virtual uh recreating of the physical events, but virtually. Um and that's how we sort of got uh got involved at um working together was because we were actually looking at maybe reusing some of Adventure to do the agentic simulation experience. But then as we got kind of closer to working together, it sort of made more sense to make something new and bespoke. Um and I remember like there was some initial ideation around like oh, maybe it could be like a SimCity type experience like cuz you know, we understood that simulation was going to be important um and so then we had the city idea. Then I think we changed to an alien space station uh if I remember correctly. Um and then uh we went back to a city and someone else uh added this idea of like what if we had to plan a marathon in a city? And then we weren't sure um like which city to use and at at one point I I don't know if you know this, but like someone was asking me a lot of questions about like how's the uh marathon in like Paris work? And we were talking about maybe picking Paris cuz there's obviously quite a few nice monuments there. Um but then we just realized like wait a second, we're holding the event in Las Vegas. Um Las Vegas has, you know, the whole Las Vegas Strip with all those monuments. That's a great kind of place to try and do it. So, um it was just like it's really fun I think with uh projects that have this like creative angle that you just sort of like find your way with them. So. Yeah, I think the creative process of putting together like uh a relatively ambitious, realistic uh reference architecture and and piece of software especially being able to open source it is um it's a lot of work and we go through a lot of iterations. So, as you mentioned um we probably had half a dozen ideas that we essentially tried to build out as much as possible until we landed on this simulation, but um even within this simulation there are some interesting ideas that some of them made it in and some of them didn't. It's true. One of our demos in the developer keynote is around adding memory and the ability to look up especially unstructured information within your your agent architectures. So we were integrating the Las Vegas rules and regulations Yeah. in order to understand what you're allowed to do and what you're not allowed to do in Las Vegas because that can have an impact on how we plan a marathon whether it be the route or how it might affect traffic or business impact. Um There's an interesting specific fact about camels that came up. Yeah, I never actually looked into the legislation or and that thing but like I know that at some point we started to talk about putting like a a 3D camel into the world as like one of the runners. Yeah, I can give you a little insight which is one of the laws on the books in Las Vegas is that you are not allowed to have a camel on the road. Okay. Like specifically a camel for some reason. And so of course that got us very interested in camels and we really wanted to put a camel on the road. Okay. Okay. Yeah. And then we were like maybe it could just be a camel emoji at one point. Yeah. Yeah. So we we we yeah. So what happened with the camel? I don't remember exactly. I think that part of the demo is just ended up like we ran out of time, right? Like we had to re-prioritize which is sad because you know RIP camel. Um okay. Well, let me see what else I've got here. Oh yeah. Which parts of the game are truly agentic? Like what parts have agentic brains? Yeah, so almost all of the agents in the system have an LLM or a model as a brain, but that we had to make some compromises because a number of reasons. One of them is that doing demonstrations during a 1-hour developer keynote is a bit of a contrived experience in that you have a very limited amount of time. So many of us have probably seen cooking shows where you prepare a meal to be baked or to be put in the in an oven and then it's going to take hours potentially in the oven. So you know, Julia Child was famous for this. She'd put something in the oven and then say this has to bake for an hour, but here I'll pull this one out that I've just completed out of another oven and here you go here it is, right? And so we do the same thing a lot in in the demos, but essentially time gets really sped up. So simulations historically could potentially run for hours or days or or even longer because you want as much information as possible or because you're modeling what would be called like an NP-hard problem or problem that essentially is an optimization problem where at some point you have to cut off the amount of time you compute and you just have to take the best information you have at that time to make a decision. I say all of that to say we have planner agents that are planning a marathon and planning its impact on the city understanding traffic assessments, that sort of thing. We have simulator agents. Those are all those also have LLM brains, although some pieces of the simulation are essentially very deterministic. For example, we talked about the game loop. So when we actually execute the simulation, we essentially want to to control time and take a number of samplings over an amount of time that represents the total duration of a marathon which is usually around 6 hours for most marathons. Sometimes it can be more, sometimes less, but about 6 hours. And so that's what we model as well. But I think the real interesting thing is where the scale and complexity starts to play in with a simulation like this. The runners themselves we wanted to be agentic. So the runners are agents in that we use ADK. We use the agent development kit. They speak the A2A protocol just like all the other agents in the system. However, our first implementation of the runners in order to get the scale we wanted actually removed the LLM from the equation. So they were deterministic. And you can think of that sort of like your classic API. They took a JSON well-structured piece of input. They manipulated their state and then they reported their new state back out. That allowed us to get sub-second like response times out of the agents and then we could have thousands of them running. So when we have a tick rate of every 5 seconds we take a sample or every 10 seconds we take a sample, we can have a thousand or even 10,000 runners responding within that amount of time so that we can collect our data effectively. Because as you know from game development and that and I have learned is if the tick rates start to slow down, that's what we would call lag. And those of you who play Minecraft know how problematic lag is on on that game. And so we didn't want lag. But we also have a version of the runners that is powered by a model. This is something that I think is work that's close to you as well which is we wanted a model that we could do a large number of calls on, but the runners aren't necessarily the most intelligent models out there. They basically have to turn their brains off for the most part and make very minimal decisions, right? But we still want to represent or understand like what a little bit of chaos and and non-determinism would be in a simulation environment. So we want the runners to be able to make decisions we wouldn't necessarily expect. So we wanted to use a model. Yeah. And so we ended up with an open-source open-weight model. Which I think you you have some experience. >> Yeah, yeah, yeah. So and my understanding is like we're running that like the scale part of that is we're using Google Kubernetes engine and we're basically scaling up these Gemma 4 open-weight models. And I have a question as well for that, but I'll come back to it I guess. Um Like and I you're right. I have been playing around with Gemma quite a bit lately. That was another project I was working on which is almost like a spiritual successor I guess to adventure. I've called it AI venture, but it's actually like a little single-player kind of vibe coding puzzle game journey experience. But I wanted to demonstrate how Gemma 4 is able to do like really good vibe coding kind of like code generation locally in that demo. And also there's actually a little puzzle. I don't know if I don't know if you saw this one, but there's like one that's more like an agentic robot that actually like you give it some instructions and it tries to like execute over and over again whatever you tell it. So yeah, I have been playing with Gemma 4 lately. Um But yeah, I was curious like with the the simulation in in race condition, what's sort of like the ratio of runners to Gemma 4 model like instances? Yeah, so a lot of that is not super easy to calculate because we run Gemma 4 on an auto-scaling cluster, a Kubernetes cluster or GKE. And we're running it using vLLM which would be the serving mechanism for a model these days especially on you know, GPUs or TPUs. So we use vLLM and really the scaling for the model itself is about how much concurrency and token throughput you can get. So it's not so much about how many agents you're running or agent sessions you're running. It's more about how many tokens do you need to get out and how many simultaneous calls do you need to make? Yeah. So that calculation is honestly a little bit challenging. So in our case it's about I think it's around 100 runner sessions to one GPU or one sorry, one vLLM and Gemma 4 instance. And this might be a silly question, but like >> Yeah. I mean I guess depending on what's happening in the simulation it would go up and down how many like tokens it needs. So is the idea that GKE can like dynamically scale that appropriately? That's right. So GKE has really good auto-scaling characteristics and you can configure that at the infrastructure level based on parameters you choose. So it doesn't scale indefinitely. It has limits that you define. And that's really helpful, but Gemma 4 is actually an ideal model for this scenario because you know, while we have Gemini which has the full it's a it's a large language model fully served and managed by Google has access to the world's information. We don't actually need all of that. We don't need all that power and we don't need all that knowledge for the runners to make decisions about whether they drink water or how motivated they are when the crowd cheers for them. Yeah. So Gemma 4 like has the right amount of inference capabilities, the right amount of knowledge baked into it. And it's also the case that we're going to make thousands of requests to the model within a minute or two to run the simulation. So the real question becomes um how can we get that throughput at in a cost-effective manner? And while GPUs can be a bit pricey, getting guaranteed throughput on a fully powered model like Gemini is also pricey. So we wanted to make sure that we get the best of both worlds or or we we try and find the compromise that we need. Yeah, I mean like and it's not just around price, but like different companies and different enterprises might like need to like they they may have some reasons that they need to use Gemma versus Gemini. Another big thing is is fine-tuning and and so we can also fine-tune uh, to be essentially an expert at running marathons, right? So that it can make smart decisions and that's another way to use private influence or inference um, to your advantage, yeah. >> Next year we'll have like actual robots running in Las Vegas. We'll go full full simulation. Yeah. Every year has to be a little bit bigger and more ambitious than the next. >> [laughter] >> Um, so I yeah, it says I can make the front end look great. Thank you. But how do you sleep at night, Casey, knowing that these autonomous agents have access to enterprise tools and databases? What happens if they hallucinate or get hit with a prompt prompt injection? Right. So the the security and autonomy questions will always be there. And as someone who builds agentic systems for a living and and helps other teams build agents at large-scale companies, I think that this is an ever-present concern and it's a realistic one. Agents have access to only what you give them access to. And and that means a couple of things. I we like to consider this in Google Cloud a shared responsibility model around security. So we're trying to build a platform that allows you to secure your agents well, but you also have to write secure agents. So it's a little bit of a give and take. In our case, we have built within Gemini enterprise agent platform a series of capabilities that I think are very important. One of them is agent identity. Those of you who build a lot on Google Cloud will know, you know, everything has an identity and identity-aware network connections are an important part of the the design patterns for building on Google Cloud. And so we built that into our agent platform as well. So we have agent identity. Every agent gets a unique identity. And that means that you can understand like who or what what principle what service is making requests. So then you can protect the systems that the agents are interacting with with agent gateway. So this is a network layer security protocol that allows you to define who or what agents or services can make requests to your to your API and protect it. In our case you have things like MCP servers, model context protocol, which is a way to bundle tools and prompts and skills into a network-attached service, so distributed systems like we've been talking about. Um, and you can provide security for that. You can say um our planner agent only has access to read financial information, but even though our MCP server has tools to modify our budget the planner agent isn't actually allowed to modify the budget. So you can control that at the network layer. Yeah. So um And one of the demos focused on that. That's right. Yeah. That's right. So you have control as the agent creator to determine what tools your agent has access to, but as we have more of a mesh system with agent registry and tools registry in our agent platform we need to make sure that we're also integrating security policies at the network layer to keep things safe. Um, okay, I'm going to I'm going to change track and ask you, what was the first software product you made? Oh my. What was? So my career started in large part in open source. My first job was at an internet service provider back when dial-up was a thing. Probably don't remember that. Uh, actually I worked at an internet service provider when I was younger and they still had dial-up. It's like in regional Australia, which is where I'm like no, I'm not from the region of regional Australia, but I'm from Australia. But in regional Australia they still needed dial-up like >> Absolutely. Yeah. And and I grew up in western Pennsylvania in the United in the United States, which was still somewhat remote in the late '90s, early 2000s. So I worked at an internet service provider and at that time the language that was really building the interactive web was Pearl. So um I got involved in the Pearl open source community very early in my career. And so a lot of my first software that I actually published or that I'd be able to really talk about were what we called CPAN modules, which is like the libraries of of Pearl. Okay. Um does is there a lot of your Pearl code running on systems in the world? >> Yeah, this is this is my fun tiny flex, which is Pearl is distributed with many operating systems on many systems automatically and I have made very small contributions to the Pearl core. So I'm a Pearl maintainer. Oh. But what that means is for example um, every Apple computer that gets distributed has my name and email address on it because it includes some of my software. Okay. Scary. >> [laughter] >> It's by no means award-winning work though. And you have some of that. Yeah. Yeah, yeah. Yeah, I've made some some games that have like one game design awards and that kind of thing. Those were more before like Google, so yeah, so like Duet was like the most uh uh, popular game that I made for primarily like iPhone and Android sort of app stores. Cool. So yeah. If I wanted to play Duet, like what would be the scenario where I'd get really into it? Would I be like on an airplane trying to pass the time or is it like a truly immersive experience or both? Um, it's kind of a bit of both. That's a really good question. So Duet they're very bite-sized like little levels. It is a very brutally difficult game and that was sort of intentional. We wanted it to be something that like, you know, you you fail quickly. It's very clear like there's no like health system, you know, it's like you you hit one of the obstacles, you fail, it automatically rewinds and just immediately starts again. Um, like that that level. So it sort of builds up this frustration, but then that means when you do get through a level, it like you know, you just get a lot of euphoria. And we worked with a musician that didn't had never worked on a game before. His name was Tim Shiel. Um, and he did the the soundtrack. So actually I do recommend people play with headphones cuz it's like really really great beautiful sort of minimalist music, which another little claim to fame is the Queensland Symphony Orchestra of Australia performed it and Tim was like a conductor where we did this like big kind of show of Duet back in my hometown of Melbourne. So That's very cool. >> Yeah, it's cool. Okay, so I want to ask a question about that experience. So game design is very intentional and trying to sort of um invoke emotion or an experience for the user is very intentional in in game design. So I'm curious how that has translated to you being a developer advocate at Google and like what do you do to get developers excited about the products that you're working on or the things that you're involved with? Uh, yeah, interesting. I mean like yeah, and it's kind of funny cuz sometimes I I don't know how to explain like what I do to people that I meet, you know, like especially non-technical people, but I guess like I I approach it as like trying to make inspirational demos and often my demos end up having some gaming element to them or they're gamified in some way. Um so I I I I think like uh, what you want to do is you want to take the technology and see what kinds of things it can unlock in the field say in games that weren't possible before, right? And um, like lately in the last couple of years I've been really focused on AI in games in particular, right? Obviously everyone's focusing on AI. But but I I don't I mean as a game developer especially as like someone who's you know, really handcrafted games in the past, I don't want it to just be about like oh, now we can just generate all the art or all the you know, music and audio of the games. I mean maybe there's ways that that could be used in an interesting way if it's like maybe using some of the input from the user for example, so that you're doing something that wasn't really possible before. But what I actually think is the most interesting part about these new AI technologies in games is really just thinking about um, game mechanics in the gameplay that weren't possible before that now AI can unlock. So like you know, I talked about the NPC that you bump into in in the AI adventure demo that I've done recently and you you give it like a prompt and then it basically keeps executing that prompt to complete the task, you know. And so that demonstrates like the tool calling, but it also just you know, in in [clears throat] games that's a really interesting kind of new space that obviously some of that could be a like could be sort of done in the past with AI in games, but I just think there's a lot of new like emergent gameplay that can be explored. So yeah, I think there's still tons of space for like crafting you know, unique uh artistic ideas and experiences in games coupled with all this new technology. I think that's great. One of the hard parts being on the back-end systems and the cloud systems, which is where I tend to live is um getting people to understand the solutions that we make. They're difficult to visualize. They're difficult to see and understand. But building the simulation to try and expose um agentic solutions that are powered by Google Cloud Platform um made it real. Like it may it makes it like a visceral experience. It feels like game even though I think technically we wouldn't call it a game, right? Um it's like game-like, yeah. It's a sort of a sandbox experience. >> Yeah. But I think it's really good to partner up and I that's something that I think we did really well with the team of people that we had working with us which was um to take um experts at front-end systems, visualizations, user interaction um in order to expose and demonstrate back-end systems in a way that felt um visceral, I suppose. >> Yeah. Yeah, and I mean, like you said, we worked with like a lot of other people. Um you know, there were 3D modelers that did like the you know, the cityscape of of Las Vegas and stuff. So, it's not like we used AI to generate the city, for example. Um yeah, what else? Oh, uh this is actually an important thing to mention. Um so, if a developer forks race the race condition uh repository today, um uh we're giving away some like free Google Cloud credits with it. What is the very first thing you hope they try to modify or break when they do that? Um I think uh I think the runners are very interesting, especially with Gemma 4 as a brain. Um again, they're not big agents. They don't do a lot. They don't think a lot. But one of the things we added with Gemma 4 to power the runners is the ability for them to have internal thoughts. And I don't know how many of you out there have run a marathon. I've not tried and because I'm terrified cuz it sounds very hard. >> a couple of people on the the Waze team though that have. They've they've told us some yeah, experiences. >> and when they talk about the inner thoughts that the agent or that they have on a race, um some of them we can't expose in the developer keynote, but I think it [laughter] would be really >> dodge, yeah. I think it would be really fascinating to see what else people could model in that on that scale. So, I think when it comes to a simulation like this the runners are kind of like end users or testers for the simulation itself. The plan for the simulation is generated using AI and we want to test how good that plan is by executing simulation. The real test is whether or not the runners enjoyed themselves, they had a good experience, but I think it'd be really interesting to continue to expand that to maybe model local businesses. How are they impacted? If you have a drive-thru fast food chain and your road is closed because of the the marathon going by, that may have a dramatically different impact than a coffee shop that you can just walk into, for example. >> And I think it'd be interesting to add additional agents into the mix that might be affected by the existence of a of a large-scale event like that in the system. Yeah, I mean, another one that pops into my mind is that I remember early on we were thinking about making it that like the runners could actually just run off the track if they wanted. And like we imagined what if you we allowed it so like the runner can have their own thought of like, "Hey, I'm going to try and cheat, you know?" And like go down a side street or something. I was really excited about that, too. Um I think continuing down this theme um continuing to expand the runner capabilities of the runners like sort of internal thoughts. You can give it a persona. Maybe um a goal. Like maybe some runners think of the marathon as a zero-sum game where it they need to win and everyone else needs to lose. And maybe whatever it takes. So, that cheating would be interesting. >> new function uh function tool calling for like trip up other other people. Yeah, we did talk about having tripping and like people bumping into each other. Yeah, and and we do try and anticipate things like traffic concerns, but I think it'd also be interesting to uh to see what happens when that goes off the rails. Um public safety goes hand in hand with like traffic issues. So, what happens if you don't have enough public safety officers? And maybe you know, cars try and sneak through the the marathon route. That could disrupt the runners maybe catastrophically. Could go wrong. Thankfully, it's just a simulation. Yeah, for now. >> [laughter] >> Um okay, wait. I have another one about traditional apps hit infrastructure bottlenecks, but with agents we hit uh token bottlenecks. We do. >> When context windows overflow. Um how do you manage the token scale? Yeah, everyone is rightly concerned about the amount of money you might spend by the amount of tokens that that you'd consume. And that's one of the reasons why we powered the LLM-based runners with private inference is that we can hit that as much as possible. We can saturate the infrastructure that we can afford to spin up um without adding additional cost beyond the infrastructure. Um so, that is an issue, but within agent design there are a couple of techniques that are really important. Um one of them or I would say they all follow on fall under the umbrella of context management. So, the amount of context you send in to a model determines how expensive it's going to be to read it all and then ultimately provide a response. Typical agentic solutions build or accumulate context over time. We know this when we interact with Gemini and we have a long conversation. Um what you might not know is that every time you hit enter on your uh response back to the model um what's happening in the environment is that it's bundling your entire session history into a one large context and then sending that every time. So, every interaction with Gemini the the API or the large language model is stateless. So, in order for Gemini to know what you've been talking about, it has to send that history to Gemini every time. So, that accumulates over and over again and every message gets a little bit more expensive. So, within context management there's a couple of things we can do. One of them is that we can have sub-agents, which is why this is a multi-agent system. So, a sub-agent might have a very specific task. It only needs a small amount of information to achieve that task. For example, the runner doesn't have to know anything about planning a marathon. It only needs to know about its current environment in the marathon at the time that we invoke it. So, we control context by elim- eliminating history and not adding any additional unneeded information. That keeps it very a very small set of tokens, which is very inexpensive. The other thing we can do are things like compacting the history. So, um that does mean that you might lose some fidelity, some details that you've accumulated over time, but it also means that you cut down on the overall token size. So, um I think those are important things for people who might not build agents every day to sort of recognize. And we're all actually using those techniques all the time when we interface with say gemini.google.com. Yeah. Yeah. No, it's really cool. I mean, I think the the other thing with this whole um you know, deployable architecture is how we're we're calling it is it gives um especially enterprise developers all these different kind of AI-driven design patterns that they can um you know, uh borrow ideas from or like learn from. Um so, if that sounds like, you know, an important kind of design pattern is like dealing with that kind of token context. Yeah, and the more context you accumulate, the longer it takes for an agent to respond. Um and so, that can be quite frustrating as an end user. And I think when we're building an interactive in-game loop as well. >> like this simulation users don't want to wait around for several minutes to get to find out what happened in the simulation. So, we try to have very real-time information being presented. So, one of the other design patterns that we incorporated, which we lifted from common classic enterprise software design patterns is a message bus. So, we have this pub/sub system where we can emit messages from all of the agents to to pub/sub as a broadcast. And the client can actually uh read that. So, the visualization that you see when you look at the simulation or we showed in the the developer keynote is getting real-time information about the inner workings of all of the agents. And we designed that so that we can give you that visibility. Yeah, cuz like it it has its own connection to the to the pub/sub directly from the client. >> That's right. Yeah. Yeah, there's a gateway system in place, but um basically it's a direct firehose of information from all of the agents about their life cycle as they're um trying to to deal with things. So, you don't just have to wait 2 minutes for a simulation to finish. You can actually see what's happening every tick. You can see what happens every time a runner gets an update. And um I think that's really fascinating pattern as well. Cool. Um All right. Uh one last question for me. Are you planning to go bouldering in Paris? Yeah, so I love rock climbing. It's one of my primary sort of activities. Um and I always travel with climbing shoes and a chalk bag. It's a very lightweight set of uh things to to be able to go. Um my preference if I had more time. Um I'm here for a short amount of time cuz we're very busy leading up to Google Cloud Next. Uh my preference would be to go to Fontainebleau and go bouldering in the in the woods or in the forest, but uh in lieu of that Paris has many really excellent indoor climbing gyms. And so, I will visit one of those Yeah, so some of them have like beer as well, I think in them. >> Oh, yeah. Yeah, a climb climbing is a whole community activity. It's very collaborative. Every And that's something I love about it. I've been all over the world to climbing gyms and even when I don't speak the same language as the people around me, we're always rooting each other on. And then in Paris we'll be saying allez allez. >> [laughter] >> But you're you're going to be on stage at Cloud Next, so stay safe. You know, we don't want like the under stage after jump in. Although if they did, they would be amazing. Yeah, of course. Of course. Yeah, but Casey thanks a lot for this chat. Yeah, I hope people explore the code and the developer solution we've made. You know, watch the developer solution video, explore like the demos that we've got. The whole developer keynote itself will obviously be put online as well. They can watch that too. And yeah, thanks again for coming and having this chat. Yeah, thank you. I think one of the beautiful things is taking everything from Angular and game development all the way to low-level back-end system architectures and Kubernetes clusters and putting it all together in one great example. Yeah, it's been great. Cool.