ENFR

Tech • IA • Crypto

Today My briefing Videos Top articles 24h Archives Favorites My topics

Gemini co-leads on project origins and what's next

GoogleGoogle for DevelopersMay 29, 2026 at 04:00 PM41:55

Audio player

0:00 / 0:00

TL;DR

Google DeepMind unified fragmented AI efforts into the Gemini model family, emphasizing scale, multimodality, and real-world usage to accelerate progress toward more general and capable AI systems.

KEY POINTS

Origins of Gemini

The Gemini project emerged from a strategic decision to consolidate previously separate AI initiatives, including Pathways, PaLM, and PaLM 2, into a single, more powerful system. Leaders argued that splitting compute resources and research teams across multiple models limited progress. The unified approach aimed to concentrate talent and infrastructure into one “general-purpose” model, reflected in the name Gemini, symbolizing the merging of parallel efforts.

Shift Toward a Single Model Strategy

The transition marked a broader evolution in AI development, moving from academic-style exploration toward highly coordinated, large-scale engineering. Earlier research culture prioritized experimentation across many directions, but increasing model complexity and compute demands made consolidation more effective. The result was a centralized effort capable of building systems at significantly greater scale and capability.

Gemini 3.5 and Capability Growth

The release of Gemini 3.5, particularly its Flash variant, highlights advances in coding performance and agentic behavior while maintaining strong multimodal abilities. The model builds on earlier iterations launched since 2023, with incremental updates steadily improving reasoning, tool use, and responsiveness. Coding and autonomous task execution are increasingly seen as defining benchmarks for modern AI systems.

Importance of Real-World Usage

Widespread deployment has become central to improving model quality. Large-scale user interaction provides feedback on strengths and weaknesses, similar to how Google Search evolved through usage data. This approach helps avoid overfitting to benchmarks and ensures models remain useful in practical applications rather than optimized for narrow test metrics.

Multimodality and the “Omni” Direction

Gemini’s architecture emphasizes multimodal intelligence, processing text, images, audio, video, and more specialized data types such as scientific or robotic inputs. Newer developments aim to integrate these into a unified “world model” capable of understanding physical dynamics and generating consistent simulations, including video and 3D environments. This represents a shift from task-specific outputs to broader environmental reasoning.

Distillation and Efficiency Gains

Techniques like model distillation have enabled smaller models to inherit capabilities from larger ones with remarkable efficiency. Engineers note that newer generations can compress the performance of more powerful predecessors into lighter systems, improving accessibility and speed without proportional increases in compute.

Challenges in Scaling Intelligence

Despite rapid progress, significant challenges remain. Combining multiple capabilities into a single model often introduces trade-offs, requiring careful balancing during training. Evaluation is also difficult, as traditional benchmarks fail to capture real-world performance and may be compromised by data leakage.

Limits of Current AI Systems

Researchers acknowledge gaps in areas such as continual learning, data efficiency, and scientific discovery. Current models require vastly more data than humans to achieve comparable abilities, and breakthroughs like fully autonomous disease discovery remain out of reach. More flexible and adaptive architectures are seen as a potential future direction.

Emerging Focus on Self-Improving Systems

One anticipated development is self-learning AI, where models assist in improving their own architecture and training processes. Early signs suggest systems could soon handle parts of research and experimentation autonomously, marking a shift toward recursive improvement under human supervision.

Future of Products and Interfaces

There is growing belief that a single powerful model could underpin a wide array of products, potentially blurring the line between applications. While some envision a unified interface, others expect multiple specialized products built on shared intelligence. Advances in interfaces, including voice, visual systems, and wearable devices, are expected to shape how users interact with AI.

CONCLUSION

The Gemini initiative reflects a decisive shift toward unified, large-scale AI systems, with future progress likely driven by multimodality, real-world feedback, and increasing autonomy in model development.

Full transcript

Even before we started the Gemini effort, there were a lot of people thinking about, you know, building incredibly general purpose models that could do things. Oriel was was leading some efforts in in Deep Mind and I was sort of helping steer some efforts uh in around the Pathways project and things like Palm and Palm 2 and so on. I actually said this is silly. We are fragmenting our efforts and fragmenting our compute and if we're going to build an incredibly powerful model, we need to all come together and work on building a single single model. That's actually where the name Gemini comes from. The twins. >> We mapped and then we reduced. >> Yeah. Yeah. Yeah. Exactly. Yeah. Something like that. And uh >> I thought it was because I had twins. >> That too. That too. >> Hey everyone, how's it going? My name is Logan Kilpatrick. I'm on the Google DeepMind team. Today we're talking with Jeff Cory Gnome and Oral about all things Gemini, the origin of the Gemini project, and so much more. So, thank you all for sitting down and and chatting. We're sitting here in Gradient Canopy. We just launched the Gemini 3.5 era of models starting with Flash. Contextualize the moment. I think we've this is like the third and a half generation of Gemini models. We've shipped a lot of stuff, a lot of models in between. Maybe oral, you want to sort of fill us in on the moment with Gemini 3.5? >> Yeah, we could do one each maybe. Uh but contextualizing, I guess we started 2023. I want to say we've had a few releases, some half models or even point one. >> Yeah. >> And been building on some foundation of multimodel, you know, tool use, agentic from from the get-go and just kept building the capabilities up. So today it's exciting to be releasing the flash version of 3.5. It's a very powerful series and the focus of this one probably is on a coding and then of course preserving and enhancing the rest of capabilities. I think everyone feels like this is really the times where coding capabilities and agentic experiences are defining what it means to experience AI and like 3.5 is a huge huge step there. >> Yeah. >> Right. And I think everyone is actually experiencing that and it is being recognized as a as a as a very strong model. These big release moments have in one way get gotten less exciting because the thing that's like most at the top of everybody's mind is not even the big release to the public. It's like what am I going to be using tomorrow to do my engineering and my research and what are my my friends around me uh you know around the office going to be using uh you know for their engineering and research and uh will they be complaining at me? Will they be happy? It's always fun and exciting on a day-to-day basis. reflecting back on sort of the initial like moment of and like coming together and forming the Gemini project and shipping those initial models. Was it obvious to you all that the like product story of like how we bring Gemini models to the world was going to be so important not from like obviously at Google we have lots of products and we bring AI to our customers through the products but actually to like improve the model was that like a we hope that this happens and it's something that we're going to intentionally work towards or has it just become like obvious over time that like we need to do that because the use cases are much more complicated than like the initial version of Gemini. >> I'm actually curious what you all think. >> Yeah. I mean I >> for me that's my job. >> I mean I can weigh and I I think you know that was actually pretty obvious that if you have a model that is used by a lot of people that you're going to get a lot of lessons and experience of what's working well, what's not working well. And you know we've seen this in search for many many years is the the use of search by our users really informs you know what are the things that's not doing well, what are the things we should be doing better. um you know aggregating lots of interesting usage statistics to understand you know that in a in a deeper way and then working on improving those things is important and AI models should be no different >> right so it seemed like that was pretty obvious from the beginning but we had to have a >> a thing out there that people were using >> yeah that's the true test like people using it and is it useful useful to people because if you go in the box and you try to hill climb benchmarks then you end up hill climbing benchmarks then you know maybe you leak your benchmark it's it it doesn't end up well. >> You don't want to build intelligence in a black box. You want it to be useful. You want people to be using it like therefore like understanding what is required like what is the like scratching the frontier is both scratching the frontier of the research in terms of technical capability but also scratching the surface of what is the next thing that you can enable users. you can't do it if you don't actually do it with the products and those two hand in hand define what the frontier means. So at the time like Gemini starts like there's a lot of already machine learning models making it into products and I think the what seemed obvious is that um if we create a single more powerful than the average of the other models like powering everything that had to be like a leap forward whether there was a single product that could be created around a single you know model that maybe wasn't as clear at the time but I think it was very clear that putting all the compute and intelligence into a single powerful model um was going to leap frog many things Google was using already machine learning for and that was very exciting to be given that amount of compute and responsibility initially but um I think it's proven to be indeed kind of the the core engine of um Google intelligence >> even before we started the Gemini effort there were a lot of people thinking about you know building incredibly general purpose models that could do things. Oriel was was leading some efforts in in Demind and I was sort of helping steer some efforts uh in around the pathways project and things like Palm and Palm 2 and so on. I actually said this is silly. We are fragmenting our efforts and fragmenting our compute and if we're going to build an incredibly powerful model, we need to all come together and work on building a single single model. That's actually where the name Gemini comes from. The twins >> we mapped and then we reduced. >> Yeah. Yeah. Yeah. Exactly. Yeah. Something like that. And uh >> I thought it was because I had twins. >> That's true. That you >> Jeeoff, that's a great that's a great segue to sort of again going back to sort of the formation of the Gemini project. I'm curious how controversial was that? Obviously like as you say it now and we sort of have we've done three and a half iterations. The sort of all the the organizational complexity of bringing teams together is now sort of behind us. Was it like blatantly obvious at the time that like we won't win and actually deliver on building the right product and models for our customers if we don't do this or was it like sort of a originated as like a more pie in the sky idea or like I'm just curious like what was your level of confidence? >> I I was sure the right thing was to come together. I actually articulated in a half-page memo like this is silly to >> fragment. We we shared this half page we should we should put it somewhere release it somewhere. But I it felt like fragmenting our best ideas across different research teams that weren't really working together and also fragmenting our compute. just both of those issues seem like things we should fix and it was a little bit organizationally complicated and time zone wise like there was lots of people in London lots of people here 8 hours apart is never a recipe for easy collaboration but I think we've done a really good job of navigating that and bringing people together and now we have like a really good amazing team all over the world and we're >> cranking out good models >> there were a bunch of teams building LLM at the time that you just needed to >> mash together basically At some point like research like research in AI was a lot more academic right like you go back like 10 years a lot more academic research and at that point like the how you organize it is not really the key element it is more about the exploration and the speed of exploration is important but as things get more and more focused what you want is to really to Jeff's point this focused operation where rather than us trying to build things in parallel Well, because these things require a lot more focus, effort and each one of them is like a major operation in terms of many researchers coming together solving many problems. At that point I think it was really really a good idea to okay this is the moment that we need to change. I think like um both organizations acted with like great urgency on that and enabled it. I think that was an experience of course it's never easy bringing two organizations together but I think everyone realize that this is the right moment and there's a huge value to be gained from this and I think everyone all of us can see like the whole organization is very proud of what we have built together right like Gemini is really the fruit of that >> it's the scale it's the fact that you can when you build one big beautiful giant LM that it's uh you know it can do is do so many things and so you know you really do need to put together yeah that many people that that much compute and >> infrastructure teams data teams and so on. >> Yeah. Better to have one of those teams than five little ones. >> Yeah. Yeah. >> Yeah. I mean to have one model than five. One thing I would say is like from the start we wanted Gemini to be I mean even pregeemini what one of the origins of the pathways project was to explore you know a single model that could do many things a multimodal model that could deal with all different modalities a very large model that was sparse so you would activate different pieces of it for different kinds of things and all three of those things are sort of represented in the the Gemini models that we have today um and I think now with Omni we've gotten the multimodal kind of aspects of the now we can even generate video uh you know we used to be able to just generate images and audio pretty awesome because you have the full capability of this amazing reasoning model that can deal with lots of input modalities and can like edit the video it's just produced >> I think Omni is a whole new capability right like I mean we had VO and Nabano of course like you can do text to video text to images but like what you want is really a model that understand all the modalities of the physical world so that it can understand the physics and everything but together with text because there's a lot of information about world in text as well at a very high level. I Cory, really quick, I have a question about this. In the in the sort of IO keynote, we were sort of framing uh Omni in the sort of like world model section and I'm curious like how much uh like is it actually have a bunch of the genie world model stuff or is this sort of just like positioning for the next stage where it takes in anything and puts out anything and that's sort of our representation of world model? That wasn't that wasn't abundantly clear to me. I hadn't thought about it that way. >> I'll give my opinion like Oreo has worked on these things a lot. World modeling means that like you really understand the dynamics, the physics, the visuals and then like you have to be able to simulate that as well because that simulation aspect is critical for both us understanding if the model has it right and also when you want to rely on the model you want a model that is going to be able to roll forward that simulation and the decisions that are coming out of the model is based on those future simulations. That's why I think Gemini only is a different category that is really transforming what we had with Gemini that is mostly understanding and text output and VO that is text input and like doing the video modeling into turning to a really true world model by training jointly. There's a hope that of course everything will transfer and making you know a better text understanding model will help the world modeling aspect but I think we're seeing this every time we try is not easy but as we get the recipes right we see that um you know back in the day like rolling out a complex video scene forward consistency all these things you kind of had to manually think about them and almost prespecify how to get the visuals right over time and when you turn the the thing the object disappeared and just by training at scale and mixing all the data more and more we're seeing these capabilities emerge and that's what's exciting and sort of the main premise I guess we were putting forward and now finally we're going to outputting also like I mean amazing consistent like 3D worlds sounds all the things I mean it feels almost impossible if you ask me a few years ago this approach would work otherwise probably we would have done it like 10 years ago but you know it it did happen yes yeah probably more data When you say multimodal, you know, you instinctively are drawn to like human modalities like text and images and audio and and video. But I think really you want the model to understand a much richer set of modalities like understand interesting scientific data that comes in genomic sequences or in you know uh chemical structure or robotic grasping data or LAR data. Exposing the model to a little bit of this kind of data makes it much better at understanding it when it does encounter more of it. I feel like part of the story of uh Google DeepMind being able to pull off this model and actually like again this like form the formation story um is actually like people and the fact that you all actually like know each other and we were talking off camera before this about when did you all like meet each other and start working together and like hear hear of each other and I'm curious for all your all your versions of that story. >> Maybe I can go first since I think I know people the longest. So uh >> one way to put it Um, for many years I did a lot of uh engineering hiring and recruiting in the very early days of Google. So I screened all the engineering resumes that came into Google for three years or something. >> It was amazing. They would just bring death like a giant stack of resumes. He'd be like, "No, yes, yes, no, no, no, yes." Like was like extremely fast. >> So I didn't actually interview Gnome, I don't think, but he had interviewed and gotten an offer. And I think you were debating should you take the offer. So, I called you up on the phone in 2000 and I said, "Hey, you know, I I let me just chat. I want to introduce myself and you know, I really like the kinds of things you you're excited about and working on. I think you'd really enjoy it here." And I finished the phone call. >> Honest question. Were you just selling at that point or did you or like was there something like he had an offer? >> So, I'm selling I want him to accept the offer. >> Yeah. >> Yes. >> I love it. >> So, then it did. >> He did. He did. And then Num became my office mate for like three and a half years or something. Oh yeah, I mean I I remember I remember joining and everyone got a mentor to like ask questions like as a new hireer because there are like a million things you don't know and uh I would ask my mentor and every time a mentor would like know the answer and you know it uh it was like wow everyone knows everything here and it turns out that Jeff was my mentor and it was just that Jeff knows everything and had written like half the codebase. >> Yeah. So then fast forward I guess maybe to 2012 I think. Yeah. >> Uh so Oral had interviewed with us and I don't think I interviewed you but I think you had an offer and I was trying to convince you you were considering this in another company. So I called you up and I said hey you should really come here. We're really doing really interesting deep learning choice deep learning models and >> we're having an awesome time. We were all in the Google brain team wedged into like probably a 30 person office just outside on the main patio of the main four buildings of the Google Plex. Somehow I managed to convince you to to come which was awesome. Uh and >> yeah there was I mean I remember lots of back and forth. I had like the last maybe one year of my PhD. So I was just writing the thesis no LLM at the time. So you actually had to you know write every single word. lots of pondering but I joined and I maybe not exactly like mentor like nom mentioned but we started two projects one of which was distillation and so I I remember I mean the codebase was complex like C++ you're like in academia so you don't know exactly how to implement things so but the idea was clear and literally like I remember sitting by Jeff's desk and he was just coding the classes like okay distillation and KL divergence and so on so forth so we didn't have coding agents at the But you know I can say maybe for a little bit Jeff was kind of acting as like the coding agent for the project and >> I mean a hard benchmark to beat still today. >> That project was good because Jeff Hinton did some of the very early exploration on emnest >> which is like a tiny tiny data set that he could run on his laptop and he had some good ideas about how to get a bigger model to to transfer into a smaller model. I'm like we got to show this thing at scale. So we trained a 50 model ensemble for 300 million images which was a lot at that time and 50 distinct ones. So we grouped the categories. So this one was going to be good at cars and this was going to be good at wild animals. Then we transferred the knowledge with distillation into a single model and it was much more accurate than the uh single model you could have trained on the raw data. And by the way, at the time I remember compute was already constraining, but all you need all all you needed to do is ask Jeff, hey, we ran out of CPU. And he would just go to some website, change a number, and we doubled it. And we we did that a few times. >> Yeah. I had super user power thing. >> That was not I I missed that. >> Yeah. Sadly, exponential growth sometimes. >> Yeah. >> Stops happening. I remember first time we really sat together talked was actually during the acquisition >> discussions of deep mind that you flew to New York and uh to London sorry there was this moment that like there were all sorts of discussions and such there's bunch of people in the room >> but then Jeff comes to me and says let's look at the code I'm like okay >> so I sat down at the keyboard I'm like okay um you know don't show me anything too sensitive but I want to see that directory Yes. >> So pokes at that directory and we go inside. I'm like okay let's see this file. >> He's like okay. >> And then like I go and explain okay here's what we are doing here. Here's what we are doing there. This is this idea. This is that idea. I mean at the time for me it was a big deal right? Like I'm sitting together with Jeff. I'm explaining to him okay these are the ideas and this is the code and like we are walking through it. >> Our first code review together. Looks good to me. >> Was it actually Jeff? you were just like pointing at random directories and then Cor just happened to know what was happening. >> We'd seen 15 talks which was great. >> At the time I would remember I would review pretty much all the code at Deep Mind. So like I would I would know pretty much everything going >> Yeah. And I think the company was like 55 or 60 people or something. So we all flew over to London and then we like not slept super well the that night and then we go in and we see 13 consecutive 30-minute talks. >> Yes. Jeff Hinton had a bad back, so he's laying on the floor in the back of the conference room. We just flattened him and then >> I heard that story. >> Yeah. Towards the end of the day, I'm like, "Okay, this seems pretty promising. Let me just >> But let's see the code because we'd seen some nice slide slide decks." But >> that's crazy. We need a movie about this. I feel like this would be a good movie. Um Oh, actually another thread of this reflecting back to three and a half years, maybe even longer than that. like something now sort of as we sit here that is both like positive surprising and also negative surprising like something maybe you wish we we had made more progress and it's surprising we haven't and also something maybe we've made way more progress than you and obviously so much of the stuff is so like hard to have imagined 5 years ago but anything that like sticks out for all of you >> maybe I start with positive and very timely for today I really didn't expect we could keep doing what we've we've been doing generation after generation which is to pack the intelligence of pro back into flash. So it's kind of like that happened in 1.0 and you could say well you know was it was the first run and it was fairly suboptimal in some way. So that makes sense we improved the recipe but in a way even that that seems to be sometimes even accelerating depending on which version we look that that flash next gen outperforms pro previous gen. I mean just even understanding distillation how it works I'm still like mesmerized how can we pack so much intelligence per bite or per parameter has distillation like fundamentally changed in a like is it like sort of and I'm not super I know of the concept I don't know the details is there like architectural improvements to the way we do distillation which is part of how we can like keep packing more in or is it like the technique is relatively the same of what you you all came up with originally? >> Yeah, I would say it's even simpler. I mean, we had some, you know, trick with temperatures in the softmax and and we had to, you know, take an ensemble of models. This >> Don't tell. Um, no, no, I won't tell. >> I'm going to just making sure. >> I'm going to spill the recipe. >> You have a really, really good teacher and then you have a student. So, but you didn't need an ensemble of 50 teachers. You just have one really good teacher and then one one student. And you pretty much use the recipe described in the original paper with some modest tweaks, but the the basic spirit of the idea is is pretty much the same. >> Wow. Let me give you the most technical explanation. It's like squeezing the lemon. You squeeze the lemon, the juice comes out, it's the good bits, you put it in a glass, which is your small model. >> I like that. Let's go. >> You should read the intro of the paper. It has some poetic intro as well about larva and insects. dark color >> the original paper that was just like soft labels and >> yeah pretty much yeah >> anything that's that's sort of you you you're surprised we haven't been able to pull off given how much progress like across the board Gemini has made over the last three and a half generations >> I mean the good side like thinking back to >> good it's also about the beginning of Google right like we we have this what was it this one box philosophy right like Jeeoff you must remember like the one box for everything like the search box also with you could use it for >> type in something it would show you sports scores type in something else it would show you stock quotes like >> right and like on the back end like you know these were all separate very separate backends and like our customuilt whatever some of them were AIish and some of them weren't >> spelling did you did you mean is like largely Gnome's uh >> starter project I think >> oh yeah the user would assume oh there must be some brilliant general purpose AI behind the whole thing and it knows it can do all these all these different things. Um and and now we actually built it like we built the general purpose AI that >> it's one box. >> It is one box. >> It is one box and it's like one back end like we finally we finally have the back end for the front end and we have the right interface cuz we built the one box. >> He wants a negative thing. No, no, not negative, not negative. But it's obviously like we people want more, you know. Is there there's something that you wish? >> But I think you should say it's hard for us, right? Like because we've been in it and for us especially for like um researchers like like you don't operate with negativity that much. Like if something doesn't work, it's a learning and like like you you you put on top of it. From your point of view, what would you have expected to see and you are not seeing? What's your disappointment? >> That's a good I wouldn't frame it as disappointment. Um but he hasn't >> but he has clearly >> I have one part engineer part researcher so engineers can be more negative. >> All right. >> Okay. >> Um I mean I felt like we would make more progress on sort of continual learning and more kind of not so structured model architectures like right now we have which are like lots of experts. They're you know all very similar structure. I felt like a much more organic style thing would be something we >> Yeah. always imagine that kind of like bigger architecture. >> I still think that could be interesting, but but we are not doing that yet, but you know what we're doing seems to work. So, >> I'm I'm a little disappointed. Okay, we we haven't um like cured every disease yet. You can't just like type in like invent me a cure for cancer or something and you know it'll it'll just do it. But, you know, we're moving along. >> Yeah. I think that and I'm curious to get y'all's reaction to this. I think it's not a negative thing, but it's like surprising to me. It seems like how much energy and effort it takes to sort of like merge the capabilities into a single model like obviously like and that and that it's like a a really difficult like juggling act like you merge in a new capability. It's not just like it works out of the box or something like that. You like trade off against something and you have to make some change to try to like make up those gaps. And I think it's not super intuitive to me as far as >> like from from my point of view that is one thing that I'm amazed with the models that there's still like there's insane amount of capacity in the model and we keep packing stuff like imagine that the current models are not like that much bigger than what was happening like 3 4 years ago >> right but like we keep packing more and more and more capability and information. So like the fact that we can do that like there is so much room in the model maybe this is the negative part like but to me like we keep doing that and it's it's still there is room and there's so much more room in these models and like that's why it makes me actually excited because like in terms of algorithmic AI development there's a lot of room I really believe in that the models have much more capacity than what we are getting out of them right now. There's going to be big innovations that's going to enable us to do a lot more with the models. >> Yeah. And part of it I would say is we really need to come up with algorithmic things that just get much more Yeah. >> out of every piece of data or example that the model sees or every token because I think you know if you look at the efficiency of say human learning it's a thousand times better than what our sort of LLM learning can do. like the LM gets to see a thousand times as much data as a you know really capable human and then gets to like roughly cap sim capability maybe slightly better in some things and not quite as good in others but it needed a thousand times as much data. So if we could make it so that you could get a thousand times as much information out of every example would be amazing. And a human has heard what like a billion words in a lifetime and then yeah a model has been trained on >> trillions >> trillion trillions and can remember them. Yeah. >> Do you disagree a bit though? Right. We're pre-trained. I mean it's not like you're the first human. So anyways there's some arguments also about that. >> But the source code is so small. >> We got like gigabytes of source code. Like >> this is one of my questions. >> This is why you don't want this conversation happening. By the way, I have I have a >> a a hard one like in terms of what's been difficult. I think evaluation is is very difficult. It's been it's been somewhat underappreciated in the community even from the academic era that Kai was mentioning evaluating capabilities in isolation or what are the next big things that will happen and how to evaluate to in a way that is not you know leaked into the data sets and that users will you know agree with the the number that's been there's a lot of you know work and and and progress but I feel like that's been maybe surprisingly hard but perhaps because we came from a table of numbers in papers and now we have users and feedback that's just been a surprising and exciting because every time you find something difficult you get motivated by trying to fix it but evaluation is one that >> that's a great you know needs to keep getting better >> I mean the whole dream of every AI researcher ever has been how do we build systems that can generalize to things they've never been confronted with and that's really you know even when you're training specific models on particular tasks, you want to generalize to new examples of that task. But I think what we're trying to do now is generalize to anything anyone might ask. And that is kind of a hard problem. But you by having a lot of users, you get a lot of feedback about okay, well, we're generalizing pretty well in these kinds of problems, but in these kinds of problems, we're falling short. One of the questions uh controversial questions that I had for you all was what is um you you all have obviously worked together for a long time uh in different capacities. What's what are some like research things that you all still don't all agree on? Um and I want to preface this by I think this maybe this is a positive thing. The beauty of I think having people who have different perspectives is that there's disagreement and we try different things. I'm curious if there's anything that like comes to mind. >> I'm trying to I'm trying >> or you all agree. I don't think we would all agree but I don't think there would be like big major disagreements because I think like in the grand scheme of design of Gemini I think like this group has experimented with all sorts of things. I think we built a lot of ideas through experimentation. I know that Jeff always had this idea of building something a little bit more flexible and has more plasticity and more fluid. We didn't get there but it's not like we disagreed on that. It's just that I think the current systems have sort of emprically showed us the way that like this is the model that we are doing and um but otherwise like I don't know if we had like big disagreements >> at any given time each of us is kind of spending more of our effort on say one particular thing or a few things and the others are not necessarily spending as much time on that thing. So, you know, like I'm spending a lot of time on, you know, what should future inference hardware look like? Because I think that's a super important capability for us to have. >> Um, and you know, you're not spending as much time, but I describe it to you in the kitchen. You're like, "Oh, yeah, that sounds good." Like, when can we have it? >> Reality is a good way of getting people to to agree. You see experimental results and see what works and what doesn't work. So, >> I mean, in general, Gemini is quite datadriven, I would say. Like lots of people run experiments at small scale and then say oh yeah the here's the results like oh that looks promising if you tried combining it with this thing and you need to use your pool of researchy compute in the most effective way possible and being data driven is >> I think something like Gemini if you think about Gemini or AI in general it is it is it pulls in so many things like from hardware to like model design to product to everything. So I think having like um this group to work together is actually one of the most important factors that actually make it work. Like Jeff as he said like focusing on hardware like Nor is focusing on models. Oreo has been focusing on models now going very deep on agents and and uh and doing doing really like deep work there. And I try to focus on okay like like where are we going with Gemini and like I mean like are we working well with like the products? are we getting that experience and like are we running well so I think all of us like work together in a way that are taking care of different important areas of like because like it's a whole technology transformation that is happening and I think like um having people who are deeply thinking about different aspects of this technology transformation I think that's what makes it work >> I love it we should do predictions just so that we have something to be wrong about uh a year when we reflect back on this conversation. Uh obviously huge amount of progress, lots of exciting things from this year's IO. If we were sort of sitting here 2027, which sounds like a madeup year, um is going to be around the corner. 2027, we're going to be sitting here at >> the madeup year. >> I mean, just like I'm like 2027 just seems not real. Like it's so far in the future and it's it's 6 months away or whatever. >> I'm going to be 50. >> Wow. Well, yeah, you're 50. Happy early birthday. Your 50th birthday. We'll we'll be celebrating. IO 2027 any sort of predictions of things that you would you are hopeful that will actually land by then from a a model capability perspective or anything like that >> let's try to predict like IO 2027 and what are we announcing >> no let's not do that >> let's do that >> any anything well like directionally directionally just like given where we are now it's like you know coding you know obviously we made a huge amount of progress on coding how you know it will we be saturated will we still be spending as much time focused on it. Same thing with agents. Like just given like sort of the exponential that it feels like we're on for a bunch of these different capabilities. >> Maybe maybe I'll jump in. I think one thing that might be happening in a year's time is like um self-learning. >> Self-learning is the same as continual learning or different? >> I think they're related. Maybe like for some it is the same but like like we are in an era where like models are a lot more agentic and like they are very good at writing code. We use them in our research. I think slowly we're going to use them more and more in our research and there will be a point where at least at some experimentation level we are going to rely on the models to improve different parts of Gemini and I think like next year we will definitely be on that path and like probably talking about it would be my prediction. Let's see. >> We'll probably be able to point to like some very significant thing in our models that was generated by the models and agents working. So I do self-improvement. Yeah. Under the guidance of Facebook. >> Hey. Right. And instead of suggesting to one of your team members, hey, why don't you like experiment with this a bit and let me know how it's going next week? We'll be telling the model to do that. >> Hard to disagree with that one. But maybe to build on the continual learning what what more as a capability. I mean the ability of a model to through its experience interactions to improve without the need to kind of update its weights some some sort of like knowledge base update that works really well right like I mean we have examples of this working but I don't think the capability has seen like a you know steep curve uh in terms of being so good that this would be an obvious thing that everyone would be using and turning on you know in the model. So that's one that I'm hopeful we'll see. One year seems possible. Yeah, maybe >> a lot of interesting weird problems on that to solve. I feel like I I see examples all the time where you like ask in today's era of this you ask models the question. It pulls in some like random personal context that is like completely unrelated about a friend's birthday party and somehow that's related to my question that has nothing to do with it. So it does it does feel like it needs like another >> a year. >> We're sort of in our own tech bubble, right? Like I mean because we are in the research of this like from your point of view and you are much more plugged into real world than us, I would say like like what what do you want to see? What what do you expect? >> That's a good question. This is not a interview Logan uh episode, but the >> maybe we should have some of them. >> No, you don't want to hear what I have to say. the model is the product. That's all I have to say. I want the models to get better. Um, no, I think the longunning stuff I think will be really interesting to see because it does I feel like that's like a a frontier that we can like really easily track and like even if coding models get 20% better tomorrow and they're really good, like I still think you'll run into limitations for like how long you want the model to like run autonomously for and it feels like that feels like it's, you know, IO 2027 if we're able to say this model's been running for like 30 days or something leading up to IO like that would be like I think really surprising to a lot of And maybe we won't say that, but like maybe something to shoot for, >> but that quantity of work that is being done independently by the model would be the good. Yeah, >> would be surprising. And I think actually I think it takes the full stack to pull that off. It's like you're going to need sort of like memory systems and you're going to need continual learning and you're going to need better hardware because it's going to cost a zillion tokens uh to let something run for 30 days. So, >> well, and also you want your better hardware to have low latency because like if it finished it in one day, you'd be way happier way happier than 30 days. 30 days is a good marketing line, but I actually I'd be happy. >> Oh, another another prediction is I think well not a prediction for announcements, but I think one thing that these agents are going to stress is that all of our tools are too slow. >> Yeah. Yeah. >> Right. Like a lot of the tools that these agents rely on, if you make the model infinitely fast, you are going to limit how much you can actually speed up real work because often they involve interactions with tools that are designed for kind of human latency >> or the frequency of working, right? Exactly. >> 29 and a half days of those 30 days are spent like waiting >> list everything. Um, another actually maybe another sort of meta controversial question that I'm curious and I think Cory I like the research take of this which is why I'm I'm interested. Um, I asked Josh this the other day which is like five years from now Google either has like three products or we have like 10,000 products. What do you think? What what what seems more plausible sort of >> We have one. >> Well, only one product. >> Yeah, it's model. >> Okay, I like it. Sure. Sure. I'll take that answer. What do the rest of y'all think? >> I mean, I do think if you have an incredibly capable model, it can do many, many things. And I think you saw in the search demos today at IO that, you know, it can even create little apps inside search that are customized for you and and the visualizations and write code. So, in some sense, that's I don't know if that counts as one product or 10,000 products or 10 million products >> if there's a bunch of users. But like on a serious note like um I feel like like people want to consume information in different ways and I think something like search is fundamental and I think like 5 years from now we will definitely have search maybe with a much more magical box but I think the idea that people want to reach information and consume that information for themselves that like that learning activity I think it's still fundamental. So I really think that it's going to be there and like um probably we'll have many many more because it will be it will be easy to do products because they're all powered by the intelligence more and more. >> Yeah. I mean I think there's many product outlets and there's you know a smaller number of things that make those products amazing. >> So like if if you think about the glasses that were demonstrated at IO that's a product >> but it's going to be made better because the models are better and they understand audio better and they can speak to you better. But that's a distinct product from search. >> Exactly. >> Right. >> I think it's clear to us there's definitely one model powering whatever it is. I'm not I'm not an expert, but as a user sometimes I feel like, you know, I make an active choice of what I want to do with a digital device, right? Like I want to check my calendar, email, buy something. And having that division that might be more of a human factor rather than the technology is incapable to present all these in a single product. But I I feel like the choice of what I want to do that focus whether it goes away or we just evolve out of it I'm not sure but I find myself liking the separation of concerns sometimes. So betting on one product I would I would not do it at this time at least for myself. >> I guess we've been talking about the umformational products products that deliver information and there you can just talk about how do humans want to consume that information. Is it visual? Is it text? Is it glasses? Is it some kind of brain computer interface where you get like the models internal embeddings like straight into your neurons or something weird like that? But uh >> vector processing >> but powered by you know the things like Omni like uh maybe uh maybe we will get into um you know physical physical products in the in the future and start >> start moving atoms and not just uh not just bits but that that is a prediction for the far future. >> I love it. Moving atoms and not bits is the future. One more fast round things that you all are building. Um I'm curious if like sort of obviously all the AI coding stuff like just maybe maybe personal less Gemini specific anything interesting that y'all are are doing it could not be with code too anything like you know physical atoms in the real world painting carpentry woodworking whatever it is Jeff you go first I mean I think I'm just enjoying some of the consumerf facing products that we're we're putting out that are now much more capable I made a cute little Mother's Day card for my daughter had her first baby in so that was kind of fun >> I love building Mother's Day hearts >> as you all know we just like I mean we made the decision we moved here so that means a new house new house comes with all sorts of things that you need to fix learn adapt so like nowadays like house DIY like it goes from like a home automation to like um like fixing stuff with a nail and hammer like so that spectrum and like I enjoy that like I like being able to do hands-on things >> I love But I'm just trying to make the model smarter, >> build some new model architectures. >> Yeah, I've been trying to build a knowledge base of lots of research that I couldn't possibly process because we were too busy building and then create a brainstorming partner to just figure out what the next big things might be. >> I love it. That's awesome. Well, thank you to all four of you for taking the time to sit down. Lots of uh controversial answers, but it was wonderful. It's fun. Uh I I made this comment last year at IO in a in a conversation. I think I made this to you, Cory, but I feel like IO and bringing people together and launching this stuff like you feel the warmth of like humanity as we build, you know, this technology together and sort of I feel this conversation made me feel that. So I appreciate it. It was wonderful to sit down and talk. Um and thank you all. Thank you all for for listening and for watching this episode of Release Notes. We'll see you in the next one.

More from Google