ENFR

Tech • IA • Crypto

Aujourd'hui Ma veille Vidéos Top articles 24h Archives Favoris Mes topics

Quoi de neuf dans la famille de modèles ouverts Gemma

GoogleGoogle for Developers22 mai 2026 à 17:5947:46

Lecteur audio

0:00 / 0:00

INTRO

Les modèles d’IA à poids ouverts Gemma 4 de Google étendent une IA locale, efficace et personnalisable, des appareils en périphérie (edge) jusqu’aux systèmes cloud, avec des gains majeurs en performance, multimodalité et flexibilité de déploiement.

POINTS CLÉS

Gamme de modèles élargie et accent sur l’efficacité

Gemma 4 introduit quatre tailles de modèles de 2B à 31B paramètres, conçus pour fonctionner sur des appareils allant du matériel IoT aux GPU haut de gamme. La gamme inclut un modèle 26B en mixture-of-experts optimisé pour l’efficacité à l’exécution et un modèle 31B pour une qualité maximale et le fine-tuning. L’objectif est d’offrir une forte « intelligence par watt », égalant ou dépassant des modèles plus grands malgré des tailles réduites.

Bond majeur de performance par rapport à la génération précédente

Un jalon clé est que le modèle 2B rivalise désormais ou dépasse le modèle 27B de l’an dernier dans les benchmarks, signe de gains rapides d’efficacité. Sur des tâches comme le raisonnement, le code et l’écriture créative, des améliorations sont observées sur toute la gamme, avec des résultats compétitifs face à des systèmes bien plus volumineux.

Contexte plus long et capacités prêtes pour agents

Les fenêtres de contexte passent de 32 000 à jusqu’à 256 000 tokens, permettant des tâches plus complexes et des interactions plus longues. Tous les modèles incluent désormais le raisonnement (« thinking »), l’appel de fonctions, la planification multi-étapes et l’usage d’outils, les positionnant pour des workflows d’agents autonomes et l’automatisation en entreprise.

Passage à la licence Apache 2.0

L’adoption d’une licence Apache 2.0 remplace les conditions personnalisées précédentes, autorisant un usage commercial plus large et une intégration facilitée en production. Ce changement est largement salué car il simplifie l’adoption en entreprise et la croissance de l’écosystème.

Solides capacités multimodales

Gemma 4 renforce le traitement de la vision, de l’audio et du multilingue. Les modèles prennent en charge divers formats d’images, la détection d’objets avec boîtes englobantes, la compréhension de documents et de graphiques, ainsi que la traduction multimodale intégrée. Les capacités audio incluent reconnaissance vocale, transcription et traduction, même sur les plus petits modèles.

Performances linguistiques globales

Les benchmarks montrent de solides résultats multilingues, avec le modèle 31B parmi les meilleurs pour les langues européennes et des performances compétitives en japonais, coréen et langues d’Asie du Sud-Est. Les modèles se rapprochent des systèmes propriétaires de pointe sur plusieurs évaluations linguistiques.

Excellents résultats en raisonnement et usage d’outils

Dans des tests complexes comme BFCL, le modèle 31B rivalise avec des modèles dépassant mille milliards de paramètres, démontrant de fortes capacités de planification, d’exécution, de correction d’erreurs et de résolution itérative de problèmes.

Déploiement local prioritaire et IA en périphérie

Gemma 4 est conçu pour fonctionner localement sur des appareils incluant smartphones, navigateurs et matériel IoT. Des démonstrations montrent des assistants en temps réel, du traitement multimodal hors ligne et des applications robotiques sur des dispositifs comme Raspberry Pi et Jetson Nano.

Déploiement cloud hybride et évolutif

Sur Google Cloud, les options vont des API serverless à des endpoints entièrement gérés et une infrastructure basée sur Kubernetes. Les développeurs peuvent choisir entre une facturation par token ou des endpoints dédiés, avec support du fine-tuning et de l’apprentissage par renforcement.

Amélioration de la vitesse avec le décodage spéculatif

L’introduction de MTP (décodage spéculatif) permet jusqu’à 3× plus de vitesse d’inférence, améliorant la réactivité pour les applications temps réel et les systèmes multi-agents.

Écosystème et adoption en croissance

L’écosystème Gemma dépasse 500 millions de téléchargements et comprend plus de 100 000 variantes fine-tunées. Elles vont de modèles spécialisés comme MedGemma pour la santé à des versions linguistiques améliorant l’accessibilité dans des régions comme l’Afrique et l’Europe de l’Est.

Applications concrètes et accessibilité

Des démonstrations illustrent des usages pratiques comme des agents IA optimisant les systèmes de transport, des assistants mobiles hors ligne et des outils pour personnes malvoyantes offrant une navigation en temps réel. Ces exemples soulignent le rôle de l’IA embarquée pour la confidentialité, la fiabilité et l’accessibilité.

CONCLUSION

Gemma 4 positionne l’IA à poids ouverts comme une alternative évolutive et efficace aux grands systèmes propriétaires, permettant des applications multimodales avancées et orientées agents, du local jusqu’au cloud.

Transcription complète

[music] Hey everyone, welcome to what's new in the Gemma open model family. My name is Olivia. I'm the product lead for Gemma and today I will be joined by Gus, a product manager on our team and Yan Valentine, our developer relation engineer. So what is JMA? Jimma is a open weight large language model at Google deep mind. We started GMA in 2024 with a simple premise. How do we create a model that you can run everywhere and fully customize to your need that created two requirements for us? Number one, make sure that the model is reasonably sized. Number two, make an open weight model that you can fine-tune to your wheel. With Gemma 3, we did exactly that. We created a multimodel family of model running from IoT devices at 1 billion parameter to larger consumer GPU at 27 billion parameter. And we made sure to apply one thing which is part of our philosophy aka give you the highest efficiency possible running on your local machine. And a couple of weeks ago we've announced Gemma 4, our most capable open model to date. [applause] Thank you. We apply the same philosophy with JMA 4, giving you four model size ranging from 2 billion parameter to 31 billion parameter that you can run everywhere. You want to run at the edge on IoT devices, use a tubian parameter model. You want to run on a higherend phone or lower-end laptop, use a phobian parameter model. you have a more powerful machine and you want the highest efficiency and latency use a mixture of expert at 26 billion parameter. Finally, you're looking to fine-tune and have the highest quality possible. We have a 31B ready to go and we maintain our philosophy around efficiency. Our GMA model here evaluated on LMA are scoring as well as model 20 size. We want to make sure that you have the best bang for the buck, the highest amount of intelligence per watt. Our model are highly efficient by training. As you can see on the right hand side, we've improved capabilities across the whole spectrum from text use case, creative writing, code or reasoning with science, legal and so many other areas. The reception so far has been fantastic and we cannot thank you more. We have more than 100 million downloads. We had more than 6 million downloads of the AI edge gallery which you can demo outside and the reception has been again incredible. We are very very happy to see that people are using and enjoying the GMA for models. In terms of capabilities, we've changed a couple of things compared to JMA 3. Now our models have a larger context window. We moved from 32,000 tokens to 128,000 for smaller size up to 256,000 for 31B and 26B. Now all of our models come with both reasoning aka syncing and function calling making them fully ready for the agentic era. Finally, we change our license. We moved away from a custom GMA license to a popular Apache 2.0 or license giving you full control when you deploy in production. [applause] We know that the Apache 2.0 license was very wellreceived and we're very happy. We want to see JMA being used more in your pipeline. In terms of capabilities, I want to highlight a couple of things on the vision side because we know our models are being used at the edge. We made sure to give you the highest amount [snorts] of efficiency. We have now variable aspect ratio images and vision understanding. We can process charts, document and screenshot better than ever. And we have multimodel translation being built from the ground up. On the agentic side, our model are fully ready with multi-step planning, tool use, and autonomous task completion. You can now use them in whatever agentic pipeline you have ready to go. Finally, audio understanding on the smallest size has been completely redone from the ground up. We are more than ever multilingual in essence with speech understanding and have good transcription and translation capabilities. Of course, text understanding, we have that since JMA 1. We continue to be state-of-the-art in English. When it comes to our main JMA family and we've extended that across like internationalization, which is something we'll cover after. From a vision standpoint, a couple of things to highlight. We have object detection. We can do bonding boxes which is quite important if you want to use Gemma in IoT use cases or robotics use cases and again as I mentioned we have multimodel translation from image input and we'll show you more demos like in the rest of the presentation. JMA comes with multiple architecture ready to go for where you want to deploy them. If you want to push GMA at the edge we have per layer embedding which is a new technique we've developed for JMA 4 to optimize efficiency. If you want to deploy them on your laptop, we have two type of architecture. A 31B Dent which is going to make fine-tuning easier and help us like get the highest amount of intelligence per tokens. The other one parameter sorry and the other one it is a 26b giving you the highest efficiency in terms of runtime. From a benchmark standpoint, I don't want to spend too much time there. Please go and try the JMA model and see if they fit your use case. What I want to point your attention to is the capability of a JMA 4 E2B model compared to the previous generation at 27B. Our 2B model this cycle is matching or even better than our 27B last year. And that makes me very excited about the future. I'd love to see if next year we're able to give you the 31B capabilities in your pocket running on your phone fully locally. I think that makes up for a very exciting future. Now I've talked about multilinguality and I wanted to show you that it means a lot to us. On the left hand side you can see Eurovval which is an independent benchmark leaderboard where the model is being rated across like European languages. And what you can see there is that all of our GMA models are scoring pretty well. A 31B specifically ranked from between number one to number five on almost all European languages and that is across open sour open source and closed source model. On the right hand side you have another example where we're comparing Gemma to other very successful models on Japanese languages and our Gemma model is getting very close to GPT 5.4 we have very strong compatibility with Southeast Asian language and Korean is another language where we excel at. Now another benchmark which was quite amusing foot truck bench. So I didn't know this one before releasing GMA 4. It's actually like measuring the reasoning capabilities of the model as well as the function calling. And we found out that the 31B is to quote the line a beast at that. Uh the model is doing extremely well in an environment where you need to have deep reasoning and I would say reasonable function coding. As you can see here, JMA 31B is competing with DeepSig V4 Pro, which is a more than one trillion parameter model and lots of topline uh close source model. In terms of efficiency, as I mentioned to you at the beginning of our presentation, we want to make sure we give you like the best bang for the bucks and LMA has done a couple of uh analysis on that front. We've also released MTP Drafter enabling speculative decoding on our model lineup giving you up to 3x speed up when it comes to your decode speed. So if you haven't used it yet, I would massively recommend you to move towards MTP and gain the highest efficiency performance on your Gemma implementation. Now again JMA being fully open and compatible with the open-source ecosystem, we have support across the board. We are supported from a mobile and web app standpoint. You also have all your favorite softwares ready to go with JMA4 on your local machine and we have deep implementation on the cloud server. And this year specifically, we also made sure that we had a day zero implementation with GMA 4 in the Android ecosystem. So you can either try Osmos Gemma running directly on your phone available through the API that Android is exposing to developers or to code your Android apps locally with our Gemma 26B model when you're on the plane or you don't have access to the Gemini API. And now I will pass over to Gus who will cover our implementation in the cloud. [applause] Hi everyone, good morning. I'm Gus, one of the product managers for Gemma. Very happy to be here. Nice to see some familiar faces. Uh, as Olivia was saying, Gemma is pretty cool. Lots of great features. And when you decide to build something more than just running for you on your machine, if you want to go and build an application or use on your company or your institution, you need to serve this model somewhere, right? So, they have to be served from any provider. you can use on any cloud provider, any uh inference providers that we have. But of course, we did everything we could to make sure that on Google Cloud, we optimized as much as possible. And this is what I'm going to tell you a little bit about. First, if you uh J is available in all the services inside Google Cloud that you can use a model, right? So, for example, in Gemini Enterprise agent platform, we have the model guarding where you see multiple models that you can use from there. In this screen on the right, you can with one click deploy the model for you. You create an endpoint right there. You can choose the uh uh the GPU here. Uh you can choose uh which variant of the model. You can choose how much memory. You can choose a lot of things. It's one click deployment. Right? You click show this screen here. And then you have an endpoint for you. This is the easy one. You can use a uh H100 RTX 6000. We have multiple GPUs that you can play with. Uh this is one option. The other option still on the uh Gemini enterprise agent platform you will have a model as a service that we call instead of you one click deploying the model we have the 26B already ready for you to use you just call the API right and then you pay per token like other inference providers right uh that's an important difference here when you do this serverless uh GMA 26B you pay per token when you deploy using model garden you pay per the end point the end point's yours you are paying for it to be online so it's a different way of uh paying for resources of course when you do that you have access to a uh API SDK to access these models this is uh on uh Gemini enterprise agent platform and we also have option to fine-tune the model right there so for example if you want to do some post training some reinforcement learning you have access to also do this on the same part of the uh Google cloud platform you can do uh use the same tools that you're already used to and your do a further fine-tuning and customizing the model. I'm going to talk a little bit more about fine-tuning later but this is also one option that we have when we talking uh deploying using cloud. There are multiple levels of difficulty or or challenges that you'll face. Uh and that depends on how much you understand and how much you want to use time to manage your servers, your endpoints, right? Uh Gemini Enterprise agent platform I would say is in the middle. You have some control uh but you also there's a lot of stuff that helps you do it easier. When we talk about uh Google Cloud Kubernetes Engine, I would say this is more advanced, but you also have control of everything. You have access to every knob. You can change any configuration that you want. We you can uh we also have of course Gemma there. We have recipes for you to easily follow and deploy. But there you have access to the VM and you can change anything you want, right? Of course, we have recipes with everything preconfigured for you with the best parameters. But if you want to change something to optimize for your infrastructure for your use case, you can go and do all of that. You have total control and flexibility on these endpoints. Uh, of course, everything is always optimized for performance. We work with them to also have the MTP modules already available for you whenever you want to use. And this enables you to create uh all your agent works already with the servers there. So you have all of that. So it's in when you talk about GKE you have access to create this provision you have access to all these GPUs you can use TPUs also you can use any any kind of hardware you want you choose how you create your VM and you can deploy that and start serving it's pretty quick but it needs you more expertise I would say right GKE have give you all the power but you also have to pay attention to all the details but if you want something that's the easiest solution we have cloud run so you can also also very easily two lines of code deploy Gemma using cloud run. What's the difference from the other two? Cloud run gives you an endpoint. You just deploy and it gives you an endpoint. This endpoint you can access do queries and use Gemma uh the same way you would use local or other providers. But if you don't use the the server will spin down and and scale to zero or if you your app becomes very popular, you get many requests, it can also scale up to maybe 100 GPUs and it can do all of that. Uh and when you start from zero, for example, let's suppose your server goes down because nobody was using for whatever reason. When you go back up, it takes you a couple of seconds of warm-up and the end point is already there. So it's super optimized uh for you to if you want to do something quicker and this is the easiest solution I would say. So we have cloud run as the easiest most straightforward solution we have uh Gemini enterprise which was called vertex Gemini enterprise that's the middle term you have some more options uh and something are help you and then you have GKE which you have access and control of everything so you choose wherever you want to to play the game there. Uh, that's great. Now you understand what Gemma is. We showed you all the capabilities, what it can do, where, and how you can deploy. That's great. But let's go a little bit beyond Gus, Olivia, Ian just saying how amazing it is. Let's try to show some cool demos and show it working real real doing doing some fun stuff. We have a bunch of demos and we are going to go from uh let's start from this side. We're going to go from enterprise or the 31B model which is the the stronger one. show you some more uh advanced use cases. Then we're going to show something that could run on your local uh machines or laptop right now. Then we'll show something that runs on your phone that could be already running on your phone. We have demos outside also. And then we'll show some IoT and robotics. Let's go over all of this. I hope you get excited with this. The first one, I like this very much. Let me see if I can do play. Okay, so what's happening here? Uh I want you to pay attention to the query that's happened there. Uh the idea is do you know those bikes that you pay to use per time like the city city bike management system? What happened here is we asked Gemma hey help me optimize uh the revenue and here's the database. So the setup now that let now I can explain everything else. The setup is we get the ADK framework, the agent development kit uh running using Gemma 31B that's on cloud run. We configure it to have a big query MCP uh server connected. So the model can do queries to this database that it doesn't know. It's just a a database. This database is of course the seat uh seat bikes uh database. It has all the points where the bikes are demand. There's time when bikes were used all of that right all the analytical database you would expect with that we ask the model hey help me optimize for revenue and all that you are seeing here this is not accelerated this is real uh real speed right the model creates a plan to first of all understand what's available in this database like you would do if I ask you this thing you first thing you would do is like let me take a look on the database and what's available That's exactly what the model do. Oh, let me see what it's available in this database. From there, it devises a plan to find where it can optimize the the the service to get more bikes to add other points with bikes. It tries to do a bunch of experiments until we get to the end and show these are the places where you have to increase the number of bicycles. And this and during doing that you can see there for example uh there's a moment it makes a mistake when it tries to join two tables and there's a type that's wrong it can fix this oh there's an error let me fix this join all of this by itself right there's no other user turn there was one query at the beginning and this runs for 3 minutes the model trying stuff it does it and then it makes a mistake it regenerates it fix something oh I found something new and it keeps going keeps going trying to optimize the service and this is this is a pretty good example. I strongly recommend you try stuff similar. Uh the model it shows that Gema 31B specifically is very strong on planning and executing that plan even if has to execute code somewhere run a query somewhere or get more information from other sources think more join this information and keep on this planning. This is uh the power of Gemma 31B. Uh I guess this is the so it found the best locations and it knows that's the answer right. Uh so this it's showing what are the 10 best locations to improve the service right there based on all this back and forth. So now I know thank you [applause] J is pretty cool. Nice. This is one right running on server. You can do that connect to MCP. Let's go into another one. This is using this this is using Firebase AI logic and the challenge here is a little bit different. Imagine you are running the model on your phone. Uh one if you try to run uh models on your phone there's there's some moments where the model might not be not be available for you on your phone because the phone is maybe old, doesn't have enough memory or something like that. Right? So if you're app developer and you build a feature that requires the model to be there when the model is not there you face a challenge automatically there what do you do one option is to have a hybrid hybrid solution where you have the model running on the phone when possible and when it's not possible you call the model on the cloud provider. So you have a hybrid solution that use both local Oh, it's stopped. Oh, you saw the video. It runs local or on the server depending on what you want to do. Firebase logic helps you do that. It configures you configure for it to uh prioritize local and if it's not available local, it will just route the same query to your server that you configured there. So that gives you a easier way to do a hybrid solution. Why this matters? because it simplifies your life as a developer because if you have to do that by hand, it can be quite quite uh annoying and hard. So, Firebase Logic helps you do all of that. It can use your uh AI core implementation locally. If it's available on a pixel, on a Samsung device, if it's not available, you can call the server and make this from the user perspective completely transparent. You don't know that you're missing anything and you implement the same logic. This is also already available. For the next demo, I'd like to show uh call Ian. He will show a bunch of more demos and I'll come back later. Thank you. [applause] >> Yes. Thank you very much, Gus. Uh so my name is Ian. I'm the developer experience lead for Gemma. Uh and the next thing I want to talk to you about is Gemma's coding capabilities. So all of our models can code, even down to the E2B model. And what we're going to show here is a demonstration of a uh uh a game called AI Venture where you can vibe code along using the model. And we'll play the video so you can have a look. Want to learn how to build aic workflows and integrate vibe coding into your apps? Well, we built AI Venture, an open source retro dungeon crawler that doubles as a Genai masterclass for developers. Built on an Angular and PhaserJS stack, AI Venture is powered by Google's Gemma 4 openweights model. First, we'll explore [music] VI coding. You prompt a chicken NPC to build a web app. Behind the scenes, Gemma 4 writes HTML, CSS, and JavaScript on the fly and renders it locally in an iframe to validate the constraints. Next, you'll build aic behaviors. When encountering a puzzle you can't physically cross, you must prompt a robot NPC. This triggers an autonomous thinking loop. Gemma 4 evaluates the game state, executes specific tool calls to locate a hidden switch, and interacts with the PhaserJS game engine to solve the puzzle. The biggest takeaway, you have complete flexibility in how the model is served. You can run Gemma 4 entirely locally with zero ongoing server costs using Transformers.js in the browser or by plugging into Olama or LM Studio via the OpenAI compatible interfaces. Need cloudscale? A simple config change roots your request to the right Gemini API or Google Cloud. Ready to look under the hood? Scan the QR code and explore the GitHub repo and read the full developer solution. Take on the quest to expand your AI tooling knowledge with AI venture. Thank you. [applause] So there you go. As Pastian Ian mentioned, uh you can run this model in a number of different scenarios and we have an example you can try outside using LM Studio and we also have a version that runs with light RT and web GPU directly in the browser. So you don't even need a connection to do it. Uh next up I want to show you a demo about the speed and performance you can get about running Gemma in parallel. And to do that, we're going to move over to this laptop right here. So, what I'm going to showcase in this particular example is I'm going to show running multiple uh multiple agents running on a single server, in this case, this laptop, which are going to do a task for me. So, we're going to use it to build some SVGs. So, how about we do let's go for a cute animal mascot for Gemma. There we go. See if I can spell it right. There we go. There we go. Uh, and we're going to execute that with 15 different agents. So, let's run that query. First of all, the orchestrator, which is in the top, is going to delegate that task amongst the different agents that we've got running. And it's going to do things like little animals and stuff like that. And what you can see in the left hand side is the total throughput of all of these agents running simultaneously. And this is the parallel capabilities of running a system like uh Llama CPP directly on the device. And uh you can think of it that each one of these parallel agents is kind of like a unique user or a unique session. What you'll notice actually is that the speed decreases and that's because when you're running less stuff in parallel, uh it it is limited by how fast it can output tokens. Again, showing the concurrency. Oh, there we go. Got some crazy little leg agents here. I like this guy, this little sunshine guy. So, that was obviously a little bit of fun to just create some SVGs. I tried to show this to demonstrate the throughput, but if you wanted, for instance, to do a more complex task, you can enable thinking and take a little bit of extra time and have each of these sub aents reason about what they're actually building. I'm going to show you another example right here uh related to code. So, uh we're going to use this example to show how GMA can code uh different uh programming languages and do this in real time. So I'm going to do an example an example. There we go. Example of calling. So we're going to call the Gemma model Google. So it's Gemma for in this case we're going to call the 26B and it's 4 billion activated parameters and we're going to pick the IT versions. The IT version is the instruction tuned version of the model. So that's the one that has the chat template, understands function calling, etc. And we're going to call it with uh the open AI compatible uh endpoint. Uh and we're going to run it on local host. So if you if you're not familiar with this, most of the services like LM Studio, Lama CPP, OAMA, etc. uh have a open compatible endpoint where you can direct it to your local machine rather than to a cloud service. And this is one of the easiest ways to get started using Gemma to take that existing interface and kind of swap it over to that. So we're going to run it on local host on port in this case at80. And let's make that we're going to make how about eight agents. How about that? Let's do that. So let's run that one. So again what it will do is it will pass that instruction onto the next model and then they will subdivide that as a different task. So you can imagine in this scenario that this could be a workflow that you want to run or maybe it's a your favorite agent harness and each of these is going to run in parallel on this local machine. So it will work offline on whatever task you're currently running. So we'll wait for that to complete and then we'll have a look at some of the output. Uh so one thing that's super interesting to point out here is that this is uh the model's base knowledge about how to do programming in each of these languages. If you want to give it additional reference documentation or you want to create a skill to be able to understand your codebase and how that operates, you can provide that as a tool to Gemma and Gemma will execute that before doing it. So yeah, we can see that they Thank you. You can see here that uh [applause] uh we've got different versions in different languages. I've got Rust, JavaScript, Python. Uh some of them they have recognized that there is a library that they can actually call to do this. others that don't have the library will basically use um like a HTTP request endpoint and will work directly with a JSON service. So yeah, again as I mentioned you can tailor this to whatever task you want but imagine having your own little army of agents running just on your laptop and you know wherever you are whether you're on a plane whether you're in low connectivity whether you're out on a hike you can make this work. Thank you. So we'll move back. So next up I want to talk about go uh mobile. So we also mentioned that we have an E2B and an E4B model that are specifically tailored and optimized for mobile applications and they run with accelerators on CPUs, GPUs and TPUs thanks to light RT and AI core. So let's have a look at the video. So most people have heard that Gemma models can run on a range of different devices, laptops, desktops, etc. But not many people know that they also run on mobile phones. So I've got a Pixel 10 Pro here and I'm running the Google AI Edge Gallery app. And what I'm going to show you is how Gemma can be used to take photographs, understand the world, record audio, and to [music] use agent and agent skills, all running locally on this device. So let's see a couple of examples about what we can do. So first I'm going to show you agent skills. Agent skills is where you can ask the model on the phone to be able to interact with different apps and applications and the model will decide which of these is most suited for the purpose. In this case, I'm just going to ask uh the model a prompt and then it's going to pick which application to use. So, I will record a voice note, add to my mood tracker, I'm feeling [music] about 8 out of 10 today. I'm feeling calm and excited to talk about Gemma. So I'll send that off and then the model will then listen to that track because it understands how to [music] do uh audio input and then it will pick which application to use. So in this case it's decided to load the mood tracker and it should record that entry into my mood tracking journal. There you go. So another thing Gemma can do is it can read and understand images. So, I'm going to take a photograph of these books right here, and I'm going to ask it to output the books as a JSON schema. Take these books and output as a JSON schema with a title for each entry. [music] And it will think about that prompt, decide what output it needs to generate, and it will invent a schema that it thinks will fit. Or maybe I just want to have a new item on my desk for an arrangement. So, I can take a photograph of these plants right here and I can say something like, "Suggest three things for me to add to my arrangement. Give a sentence for each." [music] Another thing that Gemma can do on device is it's multimodal and understands audio input. So, you can record snippets of voice or audio and have Gemma process it naturally, doing things like transcription and translation. So, in this case, I'm going to record a clip, make a note of things that I need to do today. I need to pick my kids up from school. [music] I need to go to the grocery shop and get some milk. And I need to buy some flowers for my wife. [music] And what's also important to remember is that this is running entirely on the device. So, it'll work offline or in areas of low connectivity. So, let's give you an example. Imagine I was offline and I wanted to know what this was. So, let's take a quick photo. What is this thing? And Gemma will process the image, try and understand what's in it, and give an answer directly or [music] without connectivity. Okay, so it thinks it's a small toy, which is correct. So, I've shown you a couple of examples about what you can do with Gemma on a mobile device. So, go to Google AI gallery app on Android and iOS and tell us what you think. Yeah, thank you. [applause] So, uh the Edge Gallery app you obviously can download on Android and iOS as mentioned and if you want to play around with it, you can try those different modalities out and the different skills that it's able to understand uh by checking out our demo booth. So the next natural step from moving on from mobile is to move into streaming and streaming inputs. So in this particular example, I'm going to showcase a different example about how you would have a conversation with a model, right? Uh in order to do this, we have need to have incredibly low latency models on both the processing but then the whole stack. So it needs to be able to work in under a second from basically the end of you finishing your sentence to the start of the model uh responding. So let's see what that looks like. Hey Gemma, tell me a little bit about yourself. I >> am Gemma for large language model developed by Google DeepMind. I process text and images to provide information and complete tasks. >> Cool. Can you tell me a little bit about Shakespeare? >> William Shakespeare was an English playwright and poet widely regarded as the greatest writer in the English language. >> Actually, hold up. Do you know what I really want to know about the solar system? Can you tell me about that? >> Solar system consists of the sun and everything gravitationally bound to it, including eight planets, dwarf planets, moons, asteroids, and comets. >> That's cool. Can you explain how a black hole works? >> Black hole forms when a massive star collapses under its own gravity, creating a region of spaceime with gravity so strong that nothing, not even light, can escape. >> That's so cool. Thank you. >> You are welcome. you have any other questions I can answer for you. >> There you go. So, you can see how you can make assistance uh with this model with the incredibly low latency. Um, and we were using voice activity detection there. So the interesting part to me is that as part of this process you are streaming and chunking the audio as it comes in and then the model is uh processing the first part of the output and it's able to generate the TTS uh uh which is a separate model um while this is happening and that is what leads that kind of low latency and the ability to do things like interruption. So if you want to try this experience we again we have it at our demo booth outside and we'd love to hear what you think about it. Um and then obviously the next natural process for doing this is to move into the world of different modalities. So including things like vision and we have a fantastic demo here from our friends at hugging face and pollen robotics. Um and I want to introduce to you Richi Mini. Uh and what Reachi Mini is going to show you here is how you can interact with a model that can see and can hear and then can uh respond with emotions. So let's have a look. >> Hey there, I'm Richi Medi. What's on your mind? Hey Richi, can you tell me what you see in front of you right now? >> It looks like a chess board set up on a table. What are we playing? >> Uh, I'm just learning about chess. Can you teach me how the knight moves? >> That sounds like fun. I can certainly try to explain the knight's move. It jumps in an L shape to any of the eight surrounding squares. >> Well, in that case, I'm going to move here. You're in checkmate. I win. >> Oh dear. It looks like you just won. That was a quick gain. Poor Richi. Yes. I I think a bit unfair because obviously Richi doesn't have hands to move the pieces himself. But uh yeah, as you can see, you know, a local assistant that you could have a communication with. They can see and understand the world around you. And again, powered by Jim 4. So, thanks again to Joshua from the TransformersJS team for this incredible demo. [applause] So, next up, we're going to talk about personal assistance. And here to explain a little bit about more about how that might work, we're gonna hand over to Olivier. >> Thanks so much, Yan. Very exciting. So a couple of years ago, for the first time ever, we loaded a local model on the phone and I was in the product team that took care of it. This model could barely reply back to you. Very simple query, still struggling a little bit. Two years ago, we also showed you what a multimodel assistant look like with Astra running fully server side. And since that date, we're quite excited to see like what it would look like if we were to do it entirely on device without internet on a phone. And this is what it looks like. >> Hey Gemma, do we have an internet now? >> So you have a full tool call at the system level. >> No, we don't have an internet connection right now. >> What is this warning about? That sign is a warning about the puma habitat. It advises that pumas are part of the local environment and that you should be cautious if you encounter one. [clears throat] >> What is the derivative of this function? >> A little bit of math and camping, you know. >> + 7 x + 10 is 2 x + 7. >> Oh, hey. Hey Cindy, >> we are now streaming from the XA glasses on the phone. [clears throat] >> When is our group hike tonight? >> Your group hike is scheduled for 6:00 p.m. today. >> Is this a Puma? >> No, that is a dog. >> Good boy. >> I'm quite excited by the progress we've been able to make on device in just a couple of years. [applause] And the good thing is that if you're curious, we actually have the phone that is being run on and we can do the demo outside after the talk. Now, let's actually look at what it could do to help people in their everyday life. So, we have an example with someone who is visually impaired and you can see the value of AI running on device to help people in their everyday life. It's 28° and clear. Perfect running weather. [music] >> [music] >> Let's do a slow run and I'll guide your pace with audio cues [music] whenever you're ready. [music] Great rhythm. Keep that stride. [music] There's some gym gear on your left. Stay on your current track. >> [music] >> Left turn coming up. Ease into it. You've got this. And now take it slow. You got room. By using Gemma 4 to understand the runner's surroundings. The Gemma running agent acts [music] as a guide, giving blind and low vision runners the confidence to navigate safely and avoid obstacles in real time. [music] We're also exploring Gemma running agent with the new intelligent eyewear. [music] It's not just a new way of running. It's a massive leap toward true [music] wearable autonomy. >> [music] >> Stay in that lane. Perfect. Great run. [applause] So, this is currently a prototype, but we will keep on working on it. And for me, this combination of local AI, fully offline, helping people with accessibility is what I think AI for good should be. and we are very excited to continue into that direction. On that note, I will end back to Yan for the rest of the demos. >> Thank you, Olivia. >> I am a small >> I'm going to show you a quick video about today. I'm very happy to bring you those two tiny ducks to showcase what Gemma 4 can do on device. They are open mini ducks version two by Antoine Pyon on GitHub. And the idea was to bring the power of Gemma 4 to do ondevice inference and to make those ducks smarter. In terms of the technical stack, this duck runs Gemma 4 E2B on light RT on a Raspberry Pi 5 and this duck runs Gemma 4 E2B on a Jetson or Nano. They both have a microphone, cameras, and speakers. And through multimodel inputs of Gemma 4, they're able to process and understand their environment like never before. Let's see what Doug Gemma with the Raspberry Pi 5 can do. [music] So, while we wait for the system to boot, just going to describe some of the features that [music] the this project online built. They built antennas for expressiveness. There are some LEDs just to show some expressions and a speaker next to it. And then it enters attention mode which is [music] this loop that just shows some expressiveness and the fact that Gemma is not listening. And we're going [music] to try that on the other duck and it will send you some love also on the booting phase if everything goes right. Hey Gemma, explain me what a large language model is. We're using Parakit to do speech to text. Gemma 4 will do the inference and we'll use Cocooro to do text to speech. A large language model is a complex AI designed to understand and generate humanlike text. I am here to assist you with tasks and interactions. [music] >> Introduce yourself. Tell me more about you. >> I am Autumn, a small duck robot's brain. >> Very snappy. Uh Autumn is ODM, so open duck mini. So that's why he call itself Autumn, which I think is pretty cool. So this is a great way to bring [music] intelligence onto those ducks. The next step is for them to start walking to see each other, talk to each other, and maybe [music] go autonomously explore the world on their own. So, thank you. [applause] So, Zavier, the creator of our ducks, is outside uh if you want to have chat with him and you can have a chat to the ducks and see what they can do. And this really shows when you shrink gemma models down to things that run on IoT devices like Raspberry Pies that you can run physical local agents that can see, act and react in the world. I have one more slide to show you and it's on this new topic here. So we mentioned Gemma is designed to run on the edge and this is really the smallest we've been able to get. So this is a coral board uh which is a dedicated MPU that can do one tops. It's an open architecture built by the Coral team. And the demo on the top is showing the Gemma 3270M model doing translation. And the demo on the bottom is showing a function Gemma which can take voice input or text input and turn it into actions. And the incredible part of this is we're able to shrink these models down to things that are potentially suitable for things like wearables, really lowowered and lowcost devices. So Gemma really kind of stretches that whole range. Now, these are obviously just all the things that we've shown you. We have a massive ecosystem of uh developers and builders and companies that are working with Gemma models. And to give you a bit of a rundown on that, we'll pass to Gus. Thank you, Gus. >> Of course. Thank you. [applause] Nice. Pretty cool, right? Lots of cool demos, lots of inspiration for you to help us build the demos for the next year. So, that's pretty cool. And that's a pretty good uh point to get here. What is the Gemmaverse? We launched these JMA models for the community, for the enterprise, for developers to build cool things and share with the world of what they're building. And more than just building an application, when we share a model, you can improve the model and change how the model behaves. And this is what we call the gem versse is this collection of models that people keep creating based on Gemma. As of today, I I'm pretty sure that number is wrong. We have more than 500 million downloads and I think it's wrong because I think it's way more than that, right? This is update a couple of weeks ago. Thank you. And we have more than 100,000 variants of Gemma models. A variant is someone like you from the community that goes and improve the model in a specific task, language or something. I'm going to show some examples in a little bit, but we have a lot of people creating these varants. Uh, and how do you do that? You usually you fine-tune a model. So fine-tune a model is kind of if you have a a a kid, you teach something new to it, right? So the model is the same idea. You can teach a new capability or you can improve something that already knows or you can change how it behaves. This sorry and this is where you fine-tune. So if you want your the model to speak in a way that's more tailored to your enterprise, that's why you would fine-tune. Yesterday in the developer keynote, Kevin used antigraph to find tungema to be more concise in its answers, right? It can give you an answer that's maybe too fluffy as he said. I don't agree, but that's okay. But he wanted something more straightforward. That's something you can fine-tune the model with a maybe a 100 or a thousand examples and the model will be more straightforward if that's what you need. Or you can do it for teaching a new language which is way more involved and also super cool. So there's all these options and people have been doing this from Google. We've done this for example for Medgema. Medma is Gemma 3 then our model from last year fine-tuned for the healthcare area. So you can ask questions more related to to health to X-rays to all this kind of information that's way more uh focused on a vertical. So Med is available to you. You can play there's Magma 1.5 already that released this year. You can play with it. have fun. Uh it's one of the variants we have available. We have many. One other variant that was released last year is the cell to sentence. So the cell to sentence is a variant of Gemma 3 uh 27D that you can help it's helping find new treatments for cancer. This is pretty cool because oh it can cure cancer. No, no, no, no. This is helping fight treatments. These treatments are being tested already. It's not something like oh yeah, it will get there. No, no, no, no. It's been tested already and we have already some good results. So this is way more advanced than just a play a toy, right? Uh more than that, we did some variants, but the community did a lot of varants. We have many many colleagues that did multiple variants. We have some examples here. For example, the crane AI labs. They got the Gemma 3 model and fine-tuned to teach better swuahili and which impacts a lot of people. Gemma can do a little bit of Swahili but they could do even better and that enables way more people to have access to LLMs. We have Gaia Ga Seiya over there which are people that we work together to make a version that was even better in Portuguese. Portuguese is pretty important to me. Uh so it was pretty fun to work with them and we have multipoders. We have e- permit which is our colleagues in Ukraine using Gemma models on their government already to help uh make process faster. So we have many many uh variants that are already helping with the gem versse. And that being said, I would love to know what you are building next. There's so many possibilities and Gemma is here to help you build even more cool stuff. Uh we will not have time to answer questions right now, but all of us will be here in front on the Q&A session. So please come ask your questions. We're more than glad to chat. Thank you very much. [applause and music]

Sur le même sujet : Google