ENFR

Tech • IA • Crypto

Today My briefing Videos Top articles 24h Archives Favorites My topics

Build Agents with Gemini API

GoogleGoogle for DevelopersMay 21, 2026 at 11:52 PM38:13

Audio player

0:00 / 0:00

TL;DR

Google unveiled new Gemini APIs enabling real-time multimodal agents, unified model interaction, and managed sandboxed environments for building autonomous applications.

KEY POINTS

Gemini 3.1 Flash Live powers real-time multimodal agents

The Gemini 3.1 Flash Live model introduces low-latency, real-time interaction across voice, text, images, and video. It operates through a stateful WebSocket connection, allowing continuous streaming of inputs and outputs. The system supports speech-to-speech interaction in over 90 languages, enabling seamless multilingual conversations and live interruption handling.

End-to-end multimodal capabilities

Gemini’s strength lies in its ability to process and generate multiple formats. It can interpret visual scenes, written text, and audio simultaneously, while producing outputs such as speech, images, and video. Demonstrations included generating music on demand using Lyria 3, highlighting integration between conversational agents and generative media tools.

Function calling and tool integration

The platform enables agents to execute actions via tool use and function calls, including built-in integrations like Google Search grounding and custom APIs. This allows agents to retrieve real-time data or trigger external services, moving beyond static responses into actionable workflows.

Direct client-side access with ephemeral tokens

Developers can connect directly to the Live API from client applications using short-lived ephemeral tokens, reducing backend complexity. This approach supports responsive applications such as UI manipulation, live coding assistance, and interactive debugging with immediate feedback.

Introduction of the unified Interactions API

The new Interactions API standardizes how developers interact with both models and agents. A single API call structure supports tasks ranging from simple queries to complex agent workflows. It includes server-side state management, eliminating the need to manually maintain conversation history.

Step-based data model replaces chat turns

Traditional chat-based formats are replaced with a step-based interaction model, where each action—user input, reasoning, tool call, or result—is tracked independently. This structure better reflects how agents operate and simplifies orchestration of multi-step processes.

Managed agents with remote execution environments

Google introduced managed agents like Antigravity, which run in isolated cloud environments capable of executing code, accessing files, and performing tasks autonomously. Each agent operates within its own sandbox, effectively acting as a dedicated virtual machine for every task or user.

Persistent environments and shared context

These environments are persistent and reusable, allowing agents to maintain state across sessions. Multiple agents can also share the same environment, enabling workflows where one agent conducts research and another builds applications using the generated data.

Secure credential handling via proxy system

A proxy mechanism ensures that sensitive credentials such as API keys are never exposed to the agent, even when making authenticated requests. The system intercepts calls and injects credentials securely, addressing a major concern in autonomous agent deployment.

Automated application generation in a single call

Demonstrations showed agents generating complete outputs—such as a weather dashboard or a browser-based game—from a single request. The agent handled planning, coding, testing, and file generation autonomously, returning usable artifacts within minutes.

Developer tooling and agent creation

New tools including a Gemini CLI and prebuilt “skills” simplify agent creation. Developers can define agents with configuration files, attach tools, and scaffold environments with dependencies like Python libraries. Agents can also self-configure their runtime environments.

CONCLUSION

Google’s latest Gemini updates signal a shift toward fully autonomous, multimodal agents that can reason, act, and build within secure cloud environments, significantly lowering the barrier to creating complex AI-driven applications.

Full transcript

[music] >> Hello everyone. How are we doing? Everyone good? Everyone staying cool and hydrated? If you haven't, now take a sip of water. Uh we'll have some really cool demos for you today. We're building agents with the Gemini API. Now, this is a dynamic session. So, if you want to uh you're very welcome to follow along. Um in Google AI Studio, that is your easiest way to get an API key for the Gemini API. So, if you go to ai.studio, in the bottom left corner, you will find the get API key function. And you can use that to follow along uh with the code examples that we will show you today. Google AI Studio is really the best place to get started with all of the Google DeepMind models. Um it's our playground for you to test out all the different models and capabilities. So, I invite you to follow along. There's no setup required. There is a generous free tier. You can start experimenting. Uh and you can also set up billing and manage your spend uh like spend caps. That's all built into AI Studio. Now, the power of Gemini really lies in its multimodality. So, that's both on the input side. Gemini is really good at understanding system instructions, text, code as well, but also images and audio and video. So, Gemini is really, really good at understanding all these different modalities and reasoning over them, which is really cool. And on the output side, we can also generate a lot of different modalities. You probably are familiar with Nano Banana. Everyone loves, you know, generating images, editing images, but also with Gemini TTS, you can generate speech, or with V O, you can generate video. There will be great session after this one as well, which will look more at the Gen media models and that side of things. But now, let's talk about life. Gemini 3.1 Flash Life is our latest multimodal real-time conversational model. Many of you have probably already used it. If you've talked to Gemini in the Gemini app, that is the Gemini 3.1 Flash Life model powering your conversations there. Also, if you've gone to the backside of this stage and talked in the phone booth to the 1-800 F idea demo, that is also powered by Gemini 3.1 Flash Life. Now, let's have a little look at a demo. This demo was vibe coded in Google AI Studio. So, you're very welcome to scan the QR code as well and you can remix it also in AI Studio. Now, remember those times where you would call up the radio DJ and you would ask for a song, you would request a song? So, that's really the same idea here. But, our radio DJ is Gemini Life and Gemini Life has a tool and can make a function call to the Gemini API to use Lyra 3 to generate a 30-second clip on demand. So, let's have a look at that. Oi oi in context. Welcome to the live jukebox. You're live on air, ready to craft an absolute banger. So, what's the master plan for today's tune, mate? Need some synth-wave vibes or maybe something more jazzy? What sort of story are we spinning in the lyrics? >> Yeah, actually can we get like a really high energy reggaeton song about all the awesome developers here today at Google I/O? Boom! High energy reggaeton for the Google I/O devs. I love it. That's going to be massive, mate. So, we're talking tech, code, innovation, maybe a shout-out to all the brilliant minds hanging out at I/O. Tell me more. Any specific tech trends you want thrown in the mix? Come on, big man. Let's make it legendary. >> Yeah, let's maybe do some vibe coding with Google AI Studio and also some agentic coding with anti-gravity. Right, hold tight. That's going straight to the top of the pops. Expect an absolute banger with all that vibe coding and agentic coding goodness. We're talking high energy reggaeton meets top-tier tech. It's landing soon, mate. >> [music] [music] [music] >> All right, that that was very cool and very glad as well. Big round of applause. I don't speak any Spanish, but our interpreter here does. That's amazing. Thanks, DJ as well. Cheers. Wicked! Glad you enjoyed it. Okay. As you can see, live demos always a challenge, but very glad this one worked out so well. My Spanish is a bit rusty, but but it sounded pretty good to me. Okay, we can go back to the slides. So, this demo is powered by the Gemini Life API. And the Life API enables low-latency, real-time voice and vision interactions with Gemini. So, you can see here we have a stateful websocket connection, and over this connection we can send text, audio, as well as video frames to the model, and the model will understand, you know, the the images, but also the physics behind the video and understand uh kind of what is happening in the video frames, and then it can reason over that, it can execute function calls, and it can reply back in real time using audio as well as providing the text transcript for both the input and the output text. Now, this works seamlessly across 90 different languages, uh currently in preview. So, this is really cool because it's a native audio model, so we're going speech-to-speech. So, the model can very seamlessly switch between all these 90 different languages and provide a very seamless experience. We can also interrupt the model, you saw this earlier, the DJ was very talkative, so I can barge in. Uh we have the tool use, so we can uh automatically use Google Search grounding as well, where we can pull in real-time information from Google Search, or we can give it our custom tools like in this example, where we gave it access to the Gemini API to generate our custom music track with Lyria. And then we can get the audio transcription. So, again, we're not going through text here, the model is going sound-to-sound. So, we're actually running the transcription on both the input and the output audio, and then providing the text for that. We also have a really, really great partner ecosystem for the life API. So, you can use it in Firebase, uh of course, but also really great third-party um partners like LifeKit, Daily, and PipeCat, which is a great open-source framework as well. Software mentioned Vision Agents, Vox Implant, and of course the ADK, you can also use to build these real-time conversational experiences and bring them to production quickly. Now, a lot of talk. We also had a demo already, but for you, the easiest way to get started with the Gemini Life API is in Google AI Studio. So, let's go over there now. So, ai.studio. Uh we paid a lot of money for that domain, so I appreciate if you uh use that one. And then, we can see there's a ton of new tools here now. Uh but, we can go to the life API. So, in the playground, um we can find all the life models, and here's Gemini 3.1 Flash Life Preview. So, we can choose from a bunch of different base voices like Got a project in mind? Puck, here for example. Uh I like Puck. He's great. So, we'll we'll use him, and then we can simply start our real-time conversational session here. Hey, Puck. How are you today? I'm doing pretty well. Thanks for asking. Enjoying the flow of conversation. How about you? Anything interesting going on? Yeah, actually, what do you think of my outfit? The olive green shirt looks nice, and the blue tee underneath is a good contrast. I also see the hat you're wearing. Looks like a casual, comfortable look. Yeah, what does it say on the cap, actually? Ah, now I can see it. It says ai.studio written across the front. That's pretty cool. So, as you can see here, we can ingest the video frames of our camera and the model can see our outfits. Luckily, Gemini likes what I'm wearing today. Gemini can also obviously understand the text that's written in the images and extract that and reason over that. And we can seamlessly switch between the languages and we get our speech transcription, so that's our input transcript as well as our output agent transcription here. Which is great. Now, we also have some really cool life API examples in GitHub. So, if you want to get started and see all the different capabilities that the life API has, you can get started here in the Google Gemini GitHub org with the Gemini life API examples. And so, I prepared that already here for us. So, I pulled down this example and there's different examples. We can actually connect directly to the life API from the client side using ephemeral tokens. So, ephemeral tokens are short-lived tokens that we can issue on the server side and then provide to our client to directly connect to the Gemini life API. So, let's do this here. Now, I'm not the greatest designer, but this is just for you to see all the different functionality. So, you can see here we can connect to the Gemini 3.1 Flash Live model. We can give it some custom tools here. Uh we can enable the transcription, and then let's just connect. Okay, the system is ready. Uh hey, can you hear me? Yes, I can hear you loud and clear. How can I help you today? Yeah, I I have some like code example here, uh but I don't really know what it does. Like, what does this do? That code sets up a tool called show alert. It looks designed to display a message in a pop-up window to the user with a title and whatever message you specify. Are you curious about how to use it or something else? Yeah, maybe let's try that. Let's show an alert, "Hello Google I/O." Did that alert show up? Yep, it worked. The alert box saying "Hello Google I/O" should be on your screen now. Did you see it? Yeah, that's great. Uh actually, can you change the background color to green, please? Sure, the background of the page should be green now. How does that look? Yeah, that's great. Actually, is that green? Can you see that? I can see the change in your screen share, and yes, the background is now green. Is that the shade you were expecting? Yeah, that's a that's a great shade, but actually, let's switch it back. No problem. I've changed the background back to white. Anything else I can do for you? No, that's it. Thanks so much, Gemini. You're very welcome. Glad I could help. Feel free to reach out if you need anything else. There you go. So, that is a really snappy experience when we're connecting directly from the client side to the Gemini Live API. And now, finally, we also have Gemini API skills. So, these are coding agent skills that help you develop with the Gemini API. So, we specifically have a Gemini Life API skill that you can install in Antigravity or your coding agent of choice, and it'll help you work with real-time voice and vision applications and make that a breeze for you. Now, the Life API lets you build real-time conversational experiences centered around voice and vision. But, of course, there are many different ways that we interact with models and agents, and that's why we've built the Interactions API. It's our new unified API to interact with models and agents alike, and I'm excited for my colleague Philip to show you what you can build with it. Thank you. >> [applause] >> Everyone, my name is Philip, and I'm part of the Interactions API team. So, if you have feedback afterwards, if you have questions, please let me know, send me a message, wherever. And as though already said, the Interactions API is our like new unified interface to interact with models and agents. And you can see here like the code snippet, both of them look very similar, right? We have a client interactions.create call on the one on the top of one, we have a model in this case, Gemini 3 Pro preview. On the the the lower one, we have an agent. So, basically, you decide, okay, do I want to interact with a model or do I want to interact with an agent? Then I have my input, um who won the last Euro, for example. That's a very like common like Google search model type of question, so I probably don't need like a super big agent for it. I define my tools. And then on the the lower one, I have like a very more like complicated research topic, so like research the history of the Google TPUs. What's new for agent is this environment equals remote feature. That's something we launched yesterday during the developer keynote. That's basically gives the agent and Gemini a fully remote sandbox where it can run code. And we are going to play around with it later a bit, build our own agent. But what makes the Interactions API additionally unique is that it has server-side state management. So, if we are going to look at some code examples on how you would implement it before the Interactions API with generate content or with like other APIs, you always need to maintain this like client-side array of like history with the user turns, the model turns, which can become very big. You don't know if you maybe cut out a different turn. And with the Interactions API, it's very simple. You make your first call. In this case, "Hey, my name is Will." And in the outputs, you get back an ID. And the ID the ID can be used for the next turn. So, in the follow-up turns, when I ask the model, "Hey, what is my name?" It has all of the context from like the previous turn. So, you can think about it that we on the server side stack all of the context together. And that's all it takes to basically build like a multi-turn application. You just pass the ID and then you continue the text. And also, the Interactions API supports the audio generation feature. So, if you liked what Thor showed about like generating text uh sorry, generating speech, you can use the Interactions API with our Gemini 3.1 Flash TTS preview model. It is the same API interface. We have our model, we have our input, we have a response modality. So, we want to generate audio in this case. And then we have a generation config because I like Thor more than Puck as a voice. So, I configured my speech config. I can define the name. And then all it takes to make is like the same API request. And that's for all modalities or all feature we support in the Gemini API. So, for image generation using Nano Banana Pro, I define my model, I define my input. I want to generate an image of a futuristic city. I have a generation config here. I want a an image in the aspect ratio of 9:6 and the image size should be 2K. And I make the call. So, every every one of the snippet looks very um similar. That's helpful for us humans to read it, but it's also helpful for the agent to build it, right? If you have common pattern, we and agents it's much easier for us to like complete those. And let's look at the another tool example. So, that's tool use. So, like when you rebuild an agent, right? What are the key components? We have a model, which is our brain. We need tools or hands for the model to interact with the environment. And then we need like an environment or on the loop. And to build like a very client-side simple agent uh with the interactions API, we need to define our function or a function schema. In this case, we have like a file incident of type function. We create a JSON schema. So, which parameters or properties do we expect the agent uh to create? Um and then in our interactions API call, we have a tools field. And in the tools field, we do not only or cannot only provide our custom functions. We can combine the server-side tools from the Gemini API. So, the Gemini API comes with server-side tools for Google Search, Google Maps, URL contexts where you don't need to do anything on the client-side. It's all happens on the server, and you can now combine them. So, in this case, when you make a request, the model can decide. So, hey, wait a minute. I first need to use Google Search and then need to make the the custom function call. And the Google Search happens automatically on the server-side. If it decides to generate a function call, it returns to you as a user, and then you can handle this on your client-side. And a very big change for the interactions API to become agent native is like how we approach the data model, right? All of the LLMs were in the beginning were mostly created or used for chat interaction. So, we had a user turn, and then we had a model turn, then we had a user turn, and a model turn. And with agents, those didn't feel right anymore, right? We have function calls, we have the environment responding, we have reasoning, we have client-side tools, we have server-side tools, we have compactions. So, all of those different kinds of steps were not just user and model turns. So, with the interactions API, we moved to a steps data model, meaning that each individual action inside of an interaction is its own step, and we have like a user input step, but then we have like a function call or function result other individual steps. So, you no longer need to define role user and then provide the function result, which felt a bit weird to me at least. You have like a dedicated action for like the function result, and we can see in the code snippet, basically that's a a response from a model, so it has the the dedicated thought block with a summary, a signature, and then the model has its output um there. And what's very important is like we separated a bit like what is content representation and what is a an action or a step. So, our thought has a summary of type text, but it can also be uh of type image. Same, a model can output a text or can output an image. Okay. So, let's talk about the anti-gravity as a remote agent. So, together with the managed um agents in Gemini API, we launched a new hosted agent. When we launched the interactions API in December, we host uh launched our first managed agent, which was Deep Research. And yesterday, we launched our second managed agent with anti-gravity preview. That's available for all of you, so if you are curious after the session to try it out, you can go to AI Studio. Uh you can try it directly in the UI. We are going to do this and or via the API. And the anti-gravity agent is powered by the same harness as the anti-gravity IDE. So, we are basically unifying the same agent harness across all of our offerings. It has a rich set of tools so you can it can do code execution, file system operation, web search with Google search, URL contacts, it can use agent skills. And I think for me at least the most exciting feature is like this environment. It gets its own computer basically. So, until recently or if you use a coding agent, it always acts on your client and your machine. So, you might not be able to work at the same time but or what if you want to scale your agent to your customers? How can you provide one computer per customer? It's like very challenging because you now start to need to manage infrastructure. And with the new managed agent feature, every agent becomes its own small little box where it can does do its work and it can come back to you. You can like interact with the environment, download the files, access the files. So, that's very very exciting for me. And um the environments are persistent. So, you can like create an interaction and you get back similar to the interaction ID and environment ID. And in the follow-up turn, you can provide the environment ID to the call again and then the agent continues to work inside the same environment. But what if you want to use the environment with a different agent? So, maybe you have a research agent who's researching the the history of Google TPUs and then you have like a app builder agent. You want to Of of course provide a research to the app builder and you can share this with the environment. So, the research agent can run on the same environment and then you can provide the environment ID to the app builder which can then like read the same files the research agent created and build the app around it. So, we no longer have only context shared inside of the history. We can also now share context on the environment via persistent files. And then in addition to this, you can like not only um um download files during the execution, you can also scaffold your environment, so you can provide um files from a GCS bucket, from a GitHub repository. So, if you have your skills in a GitHub repository, you can make them directly accessible or via inline. And those files are not part of the model input, they are part of the environment. So, it's not that we run into like this context limitation. If you upload like a heavy video or like a big image, the model can of course read them from the file system, but they're not directly part of the file system. So, what it takes to create your own agent, you can use the antigravity agent, or you can create your own small little agents, which you can then invoke via the idea. And all it takes is an agent.create call. You can give it a name. So, in this case, it's a data analyst. You define your base agent, the antigravity preview in this case. You can provide system instruction, then you can scaffold the environment with like, "Hey, I want to provide my skills from my uh GCS bucket." And after you create it, you can directly use that data analyst ID to interact with it. But, what if you need to install certain CLIs, certain binaries? You can do this by using the agent to scaffold its own environment, and then use it um for creation. So, you can like, on the right side, yes, you can like say, "Hey, please install pandas, create my Python environment." And then you have the environment ID, and you can provide this environment ID when creating an agent, and every time you make a new request to this new agent, the environment is forked, and it has all of the pre-installed libraries available for you as like a clean, fresh environment. Okay. In addition to this, we also launched a small CLI for the Gemini API, which makes it super easy for you to give your coding ex- agent basically access to the Gemini API. Um Yeah, you can see it on the GitHub repository, and now let's look at some demos. Go. Um Okay. Yes. It works. So, I'm in AI studio and with the Anti-Gravity agent, we launched a new um app or experience. So, in the playground, you now can switch between models and agents, or you can go to the model picker where we also have the agents available where you can interact with the deep research agent or with the anti gravity, sorry, the anti gravity agent preview. And in the middle, we have six different uh cards. One of them you might have already seen during the developer keynote. That was the AI talk radio where we generated this awesome AI talk radio show from the Hacker News. But uh we have a data analyst, we have a customer support, and we pick the anti gravity preview, and then we have some prompt examples. So, in this case, I want to create a weather dashboard with the prompt like, "Hey, fetch the current weather um and the 3-day forecast for London and Ankara, and then create an HTML dashboard." And then I click run. And now, what happens on the server side is it spins up its um sandbox its coding environment, and then it starts working on it. It takes a little bit of time on the first request. While that's working, I know it's already working. So, we see the thoughts. Um Okay, we need to use the weather in and then it runs the command. Okay, curls our API for London. It continues the thinking. It makes more tool calls. It writes a Python script. Another Python script. And all of that is was initialized with a single API call. So, we are not handling any tool you a tool use loop on the client side. It's all running on the server, one single API call, and then like waiting until it's done. And you can uh scaffold those environments on AI studio as well. So, if you want to, for example, add a source like upload a skill or something additional, you can do this here, but much more important at least for me is like okay, how can we provide credential in a secure way, right? If we give our agent a sandbox where it can execute and run code, that feels very dangerous. What if I provide my API key and the agent decides to use it to I don't know, run 100,000 calls to the model or what if I provide some other credentials to like more personal information? And to solve this we created a proxy around the network around the sandbox. So the agent will not have access to the API key, but it can make API calls. So if we can limit for example, we said hey, the weather.in URL is available and if it would require an API key, we can define it here. And then what happens when it makes that call, we catch that call and replace the dummy header from what with the what the Gemini or what the sandbox has with the real API key. So the model thinks it has a valid key, but the key is never available for it, which is a very good solution in this case. And you can go back to our model. Okay, it's still Okay, it's currently creating our dashboard. Uh it has already written another file. Since we only have a few more minutes left, we can look into a pre-run example. So at the end it generates a response like hey, I have fetched the real-time data at the real-time weather and the 3-day forecast for London and Ankara and I have created some files that used the Python file, but also a dashboard. And in the AI studio we can like directly click on this file, which basically downloads the file from our sandbox, makes it accessible for us. We can see the raw code, but I mean, my HTML is not as good to fully validate if that's a correct dashboard or not, but we can view it in preview and we can see here it's like the dashboard with the model created. We can switch between Celsius and Fahrenheit. We can switch between Ankara and London. And all of that was done via a single API call. Of course, it's not the most beautiful dashboard, but one API call, 3 minutes of waiting, and we get our result. You can also quickly go back to Okay, it's still running. And there's another very cool example. So, let me open a new AI studio. In addition to ai.studio, we also have ai.dev. So, if you don't like the domain ai.studio, you can use ai.dev. There's another prompt um which asked it to create a very basic anti-gravity game with 3js where I can like um fly through an anti-gravity space. And it already worked on it earlier. So, very similar, the model thinks about it, list some files, writes the the the game basically, makes sure that the game is available and runnable for me. So, it verifies its own work, returns it to the user, and I can download it again and very similar to the dashboard, I can render my HTML, and I can click play, and now I'm like running my small little game which anti-gravity has created for itself directly in AI studio, which is very fun for me at least to to play around and to experiment. Okay. Um now, let's go into creating our own agent, right? Most of the developers don't write too much code anymore. We use coding agents for it. And to make it very easy for coding agents to use the Gemini managed agents, we created a skill, and we created a CLI for the Gemini API because at the end, it's all just an API call. So, what I have done before the session, I installed the Gemini API CLI skill and the interactions API skill into anti-gravity, but we can test it with like what skills can you use? And of course we used the new Gemini 3.5 3.5 flash model. So we have an Android CLI skill available which someone else might have installed. We have the Gemini API CLI and the interactions API skill. And what we can now do is like we can ask it like hey create a new Gemini agent that is very good at uh emoji creation in SVG. Okay. Now it reads our CLI skill. The CLI skill has a specific section on how it can interact with this new managed API. We want to Yes. Allow the commands. It lists our agents. We say yes. You already can see I've tested this. So let's see if it creates a new agent. Okay, working working. Working. Okay. We got a plan. Uh we have a plan. This plan's outlined how to create the new agent. And if we scroll down we see we have an agents.md file which defines a system instruction, a helper method to uh Python script to call when a newly agent is deployed. Okay, sounds good. We see the CLI being used. So in the Gemini API CLI you have an init command which scaffolds your agent. You can create it, you can list it, but you can also test it. That looks good to us. Now let's create it. Okay. We allow the command. And now when we look into our files folder we have this SVG emoji generator. It has a skills folder, it has a workspace folder, and it has our agent.yaml. The agent YAML basically is the same shape. Oh, let me move that a little bit, so it's easier. So, it's the same shape. We have our ID, we have the base agent, we have a description, we have the tools it can use, and then our environment. If you want to like specify source it, it's also has like a example on how to do it, and then we have our agents in the and the agents MD has an instruction, you are world-class SVG designer specialized. Okay. Of course, we can now edit it, adjust it, correct it, either manually or using our coding agent. For now, we allow it. Okay, looks good. We created it. Thinking, testing. Okay. Now, we do not want to use Python. We want to stop it. Stop here, and we tell it test it. Give us a test command using the So, that's what we take. So, we go into our terminal. Very similar, so we can either now like tell the agent to run our CLI, or we can run it ourselves. We run it ourselves. And now we should or we are going We are doing the same as we have done in Yes, studio. We start our request as a single API call. We have some thinking. We should also see soon some Yes, some with some file being generated. In this case, create rocket. We do more thinking. Waiting. Okay. While this is running, we can go maybe back to our other run we started in the beginning with the weather dashboard. Nice that finished. So, that's like the request we started in the beginning of the session and also very similar here. Now, we have a different dashboard with some nice uh icons. We have the weather forecast. I would say this version looks better to me than like the previous one. Um but of course, you can now like say, "Hey, uh make it light mode." And what's very nice is like we used the same environment. So, it has the context of like, "Hey, we created this dashboard." We can see as a follow-up turn. It read our better dashboard file. And it's now thinking. Let's see what it does. Okay, I am actually modifying the template specifically styling. Okay. Okay. Investigating the theme application. And it's now like changing the file. So, it's not only has the context of the interactions we are running. It also has access to the files, which makes it very unique to use. Let's look back to our uh model. Okay, we are now like streaming our SVG. It should be done. Okay. So, the that also completed. And if we look into Oh. We can say, "No, I run that in my terminal. Download the file." Now, we give it the environment ID. Uh here is the env ID. And hopefully, it still knows that yes, we can now use Gemini API files download. We say, "Okay." It downloads our file. We have our rocket. We copy the rocket. Yes, and here we see the rocket the HNNS generated, which looks like a very decent emoji for me inside an SVG. I'm sure I know that I know that I could not be able to do that, okay? And if we quickly go back to our other agent, okay, I've redesigned the dashboard. Let's take a look. And nice, it's in light mode. So, as you can see, it's like very flexible in terms of like what you can do. You have like now two states you can like use to manage context with the model. You have your environment where you can persist file. It's very interesting when you start to working with like bigger files, videos, audios, or if you want to work or have agents working like the same environment, or even if you are doing like some coding work. Being able to run things asynchronously and remotely allows you to like set up trigger. You can think about, "Okay, what if I get a new GitHub issue? I always want to start a managed agent which looks at the issue. Is it able to reproduce the issue? Or do we miss some kind of information?" If we miss some information, maybe the agent can go back, add a comment to the issue, and then like ask for more information. All hands off, all without needing like to create any infrastructure, to manage any like sandboxes for you. Awesome. We have some time for questions, so if you have any questions, you can go like to the outside. We are very happy to answer about the the live API, about the audio models, or specially on managed agents, or the interactions API. Do you want to say thank you? Uh thank you, but I don't we don't Yeah, thank you. >> Thank you. >> [laughter] [applause] [music]

More from Google