Gemma 40:00
We got so much. Gemma 4, Gemma 3 1, Gemma Scope, Met Gemma.
Mm-hmm.
Give us the TLDR.
Yeah, so yeah, Gemma 4 is just out. It's the, uh, most, uh, capable open model we've released so far, where we try to compact as much intelligence per parameter as we could, bring all of these multi-model capabilities. So yeah, uh, that's Gemma 4.
So one interesting thing, you have this thing with effective parameters-
Yeah
... not active parameters. Uh, can you explain what it is?
Yeah. So pretty much in the traditional transformer architecture, you have, like, this big embedding layer, right? Uh, and this new architecture is, uh, is more of a small change in the transformer architecture in the transformer block. Pretty much we add a per-layer embedding, so at every layer we add an embedding table. What is exciting is that you don't need to do, like, the full matrix multiplication.
This is pretty much a lookup table. So the Gemma 4 model is, uh, E to B. That means that it effectively has two billion parameters loaded into the GPU. Uh, it actually has almost five billion parameters, but those three billion parameters can be in the CPU, they can be in the disk, which means that you can do inference extremely quickly.
This is just a lookup table.
And what's the con? Why don't we al- why don't we always do this? Can it scale? Is it open research? Like, you know, it seems very, "Okay, if I can just offload half the parameters to save yours."
Yeah. Yeah, so pretty much, uh, here we did lots of quality experimentation, and this is really optimized and designed for, like, on-device... Uh, and when I say on-device, I mean like running in a phone, Android, a Raspberry Pi, and so on, right? Uh, when you go larger, you usually want to compact more, uh... You want to have more, like, dense architectures or MoEs.
Uh, so this, this research, these research decisions were very helpful for these small, uh, small use cases.
Yeah, something I learned from the run that you organized this morning-
Yeah.
Uh, for, for listeners, um, I think this is the first ever, like, official run club-
Yeah, yeah
... at AIE. Uh, 6:30 AM. Rough. Very rough, but, uh, at least I woke up for it. Uh, I met Cormac.
Yeah.
And he was telling me that, uh, apparently in China, the super apps are shipping models in the app bundle for inference and just, like, use among all their super app constituents.
Yeah.
And I don't know. Is, is, is that, like, a target use case for you guys?
Yeah. So actually, if you install... Like, like if you buy a Pixel phone or a high-end Samsung, they come fr- with a Gemini Nano, and Gemini Nano is baked into the operating system. And Gemini Nano is really built on top of Gemma. So last year we released Gemma 3N, which was this architecture really designed for phone use cases.
And they use a Gemma 3N with some additional training, some additional adaptations, to make the model good for, like, traditional, uh, on-device use cases, right? Uh, so pretty much when you buy, like, these high-end phones, you can already use a Gemini, uh, out of the box.
Yeah, we actually covered the 3N paper in our paper club, and this, like, idea of, like, sort of parameter offloading-
Yeah
... or, like, download on demand is, like, very cool. Is it exactly the same in the Gemma 4 stuff?
Yeah.
Okay.
For the smaller models.
Yeah.
Yeah.
Yeah.
And does it, does it scale? Is there potential G- So for reference, Gemma 4 is a 29B and a 31B, one's MoE, one's dense. But-
Yeah
... have you scaled it? Have you pushed it up? Is it...
We are doing lots of experiments.
Experiments? Okay.
Yeah, yeah. Stay tuned. Yeah.
Launch3:14
What goes into shipping a, a mainline model like this? Like-
Yeah
... what, what's the behind-the-scenes?
It's complex. The Gemma team is actually relatively small. We have, like, uh, two or three PMs, we have one marketing person, and then the rest are, like, engineers and researchers working on shipping this. Uh, of course, there's, like, the full training part. We... How do we do the post-training, distillation, post-training techniques, and so on. What is quite exciting is that once we have the model, then we collaborate with a bunch of open source partners, right?
So for example, we work with Llama CPP, Ollama, MLX, Hugging Face, vLLM, NVIDIA, AMD. So we have, uh, almost 50 external partners for every launch... Well, for the Gemma 4 launch, which has been the most complex launch. And also internally, we collaborate with a bunch of different teams. So think of Google Cloud, Vertex, Vertex Model, Models as a Service, ADK, uh, and then Android as well, right?
So we work, for example, with the Android team. And, uh, with the launch of Gemma 4, we released an integration with Android Studio. So in Android Studio there is this agent mode where you can have a, a model helping you write code and do things within Android Studio. And they ship this, uh, integration with offline models using Llama CPP or vLLM or any OpenAI-compatible endpoint.
So now you can use Gemma 4 to also write code, uh, Android applications in Android Studio.
Offline vs API4:29
Where's the difference? When would someone wanna do that versus just-
Gemini
... use Gemini?
Yeah, yeah. Of course.
Outside of the obvious you're offline, uh, or you want the privacy-
You, you fly planes a lot or something.
I did... Okay, I will say, on my long 10-hour flight to London, I did use Gemini as my-
Yeah, I, I was on Gemma 4, though.
Sorry, Gemma. Gemma.
Yeah.
Yeah, yeah. It's m- mostly offline use cases, right? Uh, or if you... Yeah. Offline or privacy, like if you want to have all of your development set up locally and you don't want to send any code to, to any API, you would use that.
Do you see a future where, you know, small models get good enough? Like, does it cannibalize? It's an interesting position. Like, you have big Gemini, you have Gemma. Both get exponentially better over time. Like, current Gemma is much better than what we had open source a few years ago.
Yeah. Yeah, for me it's quite exciting. I mean, if you look at Gemma, you compare to how we were one year ago, I would say Gemma, uh, 4 is matching state of the art from one, one and a half years ago for most things. With local models or models that you can run in your own hardware, you can get capabilities.
So you can get agentic ski- agentic capabilities, function calling, system instructions, like conversational and that kind of stuff. Knowledge is much trickier. So for knowledge you would need a larger model, right? That's why if you compare Gemini to Gemma, Gemini, uh, has much better knowledge understanding of the world, right? Like, uh, facts, information, and so on. So it really depends.
I do think we are heading towards a future in one, two years where, imagine, like, you can run a Gemini 3 Pro powerful model direct in your phone, right? And I think once we get there, things will be quite exciting, uh, from a product integration, from which experiences we can, uh, enable the users. Um- I wouldn't say it cannibalizes.
It's still, like, two very different things. Like, if you want, like, flagship capabilities, like these super complex, long-running tasks, you would use Gemini if you need factuality and so on. But I do think for many of these agentic things, we'll get to a point in which we can do very powerful things directly, uh, on device.
Multimodal6:26
Yeah. Can we talk about the multimodality-
Mm-hmm
... uh, sides? Any advances there that you wanna highlight or you've been getting good feedback on?
Yeah. So Gemma 4 was built on the same research as Gemini 3, uh, which pretty much means that we benefited from all of the improvements that happened with Gemini 3. Uh, multimodal-wise, uh, the smaller models can understand audio, images, and short videos, so 30 to 60 second videos and, and audios. Uh, for out-
Which is actually quite long.
And even the on device, like the 2B-
Yeah
... 4B2B can do-
Yeah, yeah
... very good multimodality.
Yeah. For audio we have a speech recognition. We have a speech to translate the text, and then a bit of a speech understanding. So you can do, like, a... Ask questions about an audio file and so on. So use cases that are very optimized for, like, on device phone use cases. Uh, and then on the vision side we also improve things quite a bit.
So we have object detection, pointing, captioning. Uh, we do not have image segmentation, which we know is, like, one thing that many people have been asking us. But otherwise, like, for many things, uh, we do support that. The other thing we do not support yet is video with audio. So we can u- understand, like, video input or audio input separately, but if you want to pass, like, in the same prompt both the visual part and the audio part, we still need to do some improvements around that.
And that's just a matter of, like, more data or-
Probably some additional fine-tuning could yield some very good baseline model for this.
Yeah. Yeah. What about audio out?
We are exploring some things here.
Yeah.
Uh, nothing I can share at the moment. Yeah.
I think e- everyone's excited about the... Like, when do you have native speech-to-speech, right?
Yeah.
But as far as I see, people always get excited, and then the pipelines always win.
Yeah. Yeah, yeah. Gemma is quite important for us, the multilingual aspect as well.
Multilingual8:08
Ah, yes, yes.
So Gemma supports 640 languages, uh-
You did a lot of work on the multilingual encoder, the tokenizer, right?
The tokenizer, right. Right. So-
For adding.
Yeah, exactly. So the tokenizer has been pretty much based on the Gemini tokenizer. It's extremely good. So independently of the Gemma capabilities, if you just pick base Gemma model and you fine-tune for an additional language, uh, it actually works extremely well. Uh-
What are some... Uh, sorry, I didn't read that part. What, what are some insights on the tokenization?
Uh, this comes from Gemma 3. Like, this has been done already for over a year, but the tokenizer is pretty much the same as, as Gemini, which means that the tokenizer, uh, lends itself to capture the right tokens for different languages. It's like a very good multilingual tokenizer. Which means that if you compare Gemma, uh, 3, so I'm going to the previous generation.
If you compare Gemma 3 to other models from back then, maybe the other models were better than Gemma 3, like, as general, uh, model. But if you train all of these models, uh, for, I don't know, uh, a specific Southeast Asian language, I don't know, Vietnamese, let's say, uh, Gemma would yield better results even if the o- other base models were potentially better.
Yeah. I mean, I, I, I think there is some limit at which you basically have platonic representation, right? Like, you understand the core concept and it translates to whatever language you want.
Yes.
I guess, you know, you are also... You, you have purview over all of, uh, sort of Google developer experience, and you brought the team here for the first time.
DevRel at AIE9:30
Yeah.
What was that like?
It's quite exciting, to be honest. Uh, we have already participated in previous AIE Europe conference. Like, Philip or, like, other team members have been in some of these in the past. This is, uh, London. This is DeepMind's home. Uh-
We have to.
Yeah, we have to. I mean, we brought- ... a, a bunch of researchers from the team to share about different things that we're working on. We've brought other teams, uh, from Google, not just from DeepMind, that are also, like, using AI in one way or another. So we brought people that, that are doing on-device machine learning, people that are doing lighter TS or optimizations to run models directly in phones or in the browser.
We brought people from the Android team. We brought people that are working all over Google, from robotics to research to Android. Uh, so yeah, it's quite exciting to come here and really show all of the things that the, the company's building. Uh, not just come and share, like, the things that our team is doing, but really all of the, uh, overarching AI, uh, story that we're-
Yeah. I think you are... I mean, it is the lab with the biggest scope.
Yeah.
Right? You do, do everything, including dolphins. Uh, and it's very impressive. Like, yeah, so, so you brought Sander.
Yeah.
Uh, we- would you talk, talk a little bit about the researchers that you brought.
Text Diffusion10:42
We brought researchers in a couple of different topics.
Yeah.
So we, we brought one of the researchers that worked, uh, in the Gemma development, in the development of Gemma 4. We brought a researcher that works in diffusion models as well, uh, for, uh, diffusion transformer models. So, uh, diffusion as-
Text generation
... text generation.
Yes.
Not, not image, uh, generation.
Which was announced but not released.
Exactly. We did the Gemini diffusion last year-
Yeah
... at IO. Uh, which is very cool because, uh, you can, uh, generate code extremely quickly, right? Like, it's, uh, yeah, it's stupidly-
Yeah. So the main pitch is speed.
Yeah.
But other than speed, is there, like, a secondary, you know, what can we do with a diffusion model that we cannot do with autoregressive, you know?
It's mostly speed.
Okay.
Yeah.
I, I feel like in terms of code structure, there may be some things where you're like, "Okay, I want the brackets here."
Yeah.
And then you fill in the blanks, right?
Yeah.
So fill in the middle-
Yeah
... is, like, a common code problem, but this is extended fill in the middle or, like, extended, like, "Oh, help me upscale or put a LoRA."
Yeah.
I don't know. You know, translate the image analogy-
Yeah
... to text.
Yeah, I think in the past fill in the middle was, like, this task that many companies were trying to tackle as an additional generation task, and now people are just assuming that the model can do fill in the middle with a general-
It's more autoregressive.
Yeah. Exactly.
Yeah, yeah. No, no tricks about special tokenization-
Yeah
... or anything like that.
Exactly.
It used to be a, you know, mass language modeling. You, you're trained to predict fill in the middle.
Yeah.
You had to rearrange your data set as well in order to, to do FIM.
Yeah, it was a bit tricky. People were always getting, like, the, the prompting, the, the, the tokens wrong, and yeah. If you deviated in any way from the training format, it didn't yield-
Yeah
... good results. Now we have, like, very good out-of-the-box capabilities for that.
What's the in- what is the idea about investing in text diffusion? Is there a world in which this overtakes autoregressive?
Uh, yeah, that's a good question. I think at the moment it's still very experimental. Uh-
Yeah
... I think we'll be releasing and sharing a bit more research of the things that we have been doing around diffusion, uh, generate- uh, text generation models on that space. I would say it's still very early stage.
Yeah.
I think, uh, especially, like, the model quality is still a bit worse from what you would get from the, a normal autoregressive model.
Yeah. A lot of what you were mentioning earlier about it for, you know, okay, fill in this code, lock this stuff, it seems different to how we're building agents these days of, you know, sequential tool calling, this, that.
Yeah.
Uh, I guess it's... If it's just speed, it's speed.
Yeah.
If it's an RNC, but it's just-
I could see, I could see a world where there's, like, system one, system two. System one is the diffusion one. System two is autoregressive. System one is the, the planner.
Yeah.
System two is the executor. I don't know.
Yeah, could be.
Maybe, uh, it's, it's, it's too hypothetical at this point, I think.
Yeah.
You know? But I will say-
The diffusion-
Yeah
... diffusion transformer models are difficult to fine-tune as well. Uh, so-
Yeah
... so there's also, like, a point in which, uh, uh, how much flexib- like, yeah. I, I could see a world in which, yeah, you have, like, a very strong, uh, agent manager kind of a setup, and then you have, like, executors, like diffusion-based executors that, uh, do, like, a specific coding. Are people fine-tuning outside of... You know, we see a few big companies do...
Fine-Tuning13:37
Okay, like, Cursor has a-
Yeah
... really good consistent model. There's a few that have done fine-tuning, but it seems like it's not picking up as, you know.
Yeah, so there was this period, 2024, I think, which there was, like, this... Maybe 2023. Like, there were all of these fine-tuning communities, and I think it's been changing quite a bit over the last two years because models are getting very good out of the box. So as I was saying, like, for Gemma 4 we had 50 to 60 partners.
Uh, and some of them were like, "Oh yeah, we're going to try and fine-tune, uh, the 27B model for this vision task." And they, and they were like, "Oh, actually, uh, the model works too well out of the box. We don't need to fine-tune it."
Yeah.
Yeah, we saw lot, lots of those things. So I'm seeing this excitement around fine-tuning nowadays as general conversational models.
Yeah.
There is still quite a bit of excitement around fine-tuning for specific domains like finance, uh, healthcare, specific types of data that the model didn't see. But as general conversational, like, just changing how the model behaves, you can do most, most of that via prompting nowadays, and in terms of capabilities, the models are very good out of the box.
So it's been changing quite a bit. There is still, like, the onslaught people. I don't know if you know, uh, Daniel Khan and his brother and Michael.
Every year I give them a three-hour workshop to just talk about-
Yeah, yeah. They- ... they, they are the GOATs. Uh, they still do, like, amazing tools for the community to fine-tune, and the community use those tools. But I'm seeing, like, some changes in the trends. I think, uh, people are not fine-tuning that much anymore.
And you guys put out a version of your own. Med-Gemma is a fine-tuning of Gemma 4.
Yeah, yeah. So Med- Med-Gemma, uh, the last Med-Gemma, which we released three months ago, Med-Gemma 1.5, it's, uh, based on Gemma 3.
Gemma 3.
Yeah, Gemma 3. So it's pretty much Gemma 3 and then additional training with some of our medical data sets.
Yeah. How do you see, uh... If I'm not mistaken, Apple foundation models on device were a bunch of LoRAs for different tasks.
Yeah.
And when you're constrained or running on device small efficient models-
Yeah
... uh, you guys did a offload, so you're, like, caring about efficiency.
Yeah.
Um, but, you know, do you see a world of multi LoRAs for tasks? Should people be fine-tuning the small one?
I think this is a big challenge in general in the whole developer ecosystem because let's say that you want to have 20 apps in your phone, right? And let's say that each of those apps comes with its own LoRA, right? What happens when you update the model, the base model? You also need to update all of these LoRAs.
So from a developer point of view, I think it will be very tricky because, one, you don't want to have 20 different base models in the phone of the users. The battery will just die. Uh, you also don't want to have to update 20 LoRAs every time you update the base model, right? So the release cycles in the Android world are, uh, and in the iOS world are very different.
So yeah, I think it's more of a general industry challenge that, uh, we need to f- figure out, uh, how we think that people should build ML, like, on device, uh, phone, uh, power, like, AI experiences.
Yeah.
It's, it's more of a product and developer experience kind of challenge.
Yeah. I have a question about the bigger Gemma models.
Sparse vs Dense16:29
Yep.
So you have two models that are-
Yep
... pretty similar size. One is dense.
Yeah.
One is MoE.
Yep.
Uh, can you talk a bit about, okay, say you have a 27B you're putting out.
Yeah.
Uh, how do you think about should I build an MoE? Outside of inference and using it-
Yeah
... how do you think about when to do MoE versus dense?
Yeah.
What are the trade-offs?
Of course it's inference. Uh, what, what else is there?
Yeah, but then there's two at the same size. Pretty much.
No, yeah, but I mean, one is 31B, which is dense, right? And that's like the most raw intelligence, and then you have the 27B with, uh, 4 billion activated parameters.
Right. But, like, you know, why not a 31B, 5B active, for example?
Yeah, I mean, we-
You can just fit more in dense?
Yeah, I mean, we did quite a bit of experimentation and, like, research on, like, which would be the best sizes that would be friendly to developers, and we chose... We, we made decisions around that, right? Uh, uh, the 31B is really, like, the largest model size that a quantized would fit in a consumer GPU. The 27B is more, like, an extremely fast inference, uh, within those constraints.
MoEs are challenging to fine-tune. Uh, I, I don't know if, uh, we've talked about that in the past, but MoEs in general are, like, extremely good. They architecture, they work great for inference. But when people fine-tune them, they struggle a bit. Like, they are not as easy to fine-tune for instruction following. The standard recipes and hyperparameters that you have may not work out of the box for MoEs.
The intuition is the, the routing kills the backprop or what?
I, I, I think so. Uh, I, I, I don't have a very strong intuition on it either-
Yeah
... to be honest. Uh-
People always say this, but I'm-
Yeah
... I'm trying to say why, right? 'Cause if you can train it, you can fine-tune it. Like... Fine-tuning is just training.
Yeah, I th- I think it's a mix of the routing and, yeah, just having, like, different distributions, and the distribution may affect the routing in a different way than a, a dense model, which we just change the things. Uh, that's kind of my intuition, but also, like, I think there are many different variables here, like how many, uh, experts do you trigger or, uh- Yeah.
They, they are like a bunch of different parameters that you can move, like whether you freeze or you don't freeze the-
Yeah
... the router, like a bunch of things that you need to, to think about.
Yeah. To me, the most important, uh, asymptotes that I'm looking for are what is the minimum sparsity level that we-
Yeah
... can reach, and then what is the most, let's call it Elo per byte.
Yeah. No, yeah, that- that's the thing that we discussed quite a bit, like what's, uh, the intelligence per parameter, right? Like, how do we maximize this intelligence per parameter? Because-
There has to be a number, right? Then we can stop, right?
Yeah, and because if you compare like the tw- I mean, Gemma, we have done the same size, right? 27, like almost 30 billion, around 30 billion parameters for Gemma 2, 3, and 4.
Yeah.
And the intelligence is much higher, right? Like we have now increased the model size.
Yeah, it's like that, that, that.
Yeah.
Yeah.
It was an easier number when everything was dense. Then you have to add in sparsity. Now you have offloading.
Yeah.
So.
Yeah, you cannot compare like a MoE to dense models. There's, there is no... There are some like, uh, napkin calculations you can do to compare, but it's not apples to apples. But that's a good question. Like, I don't know like where we'll be in three years from now. I would assume like a 30 BP, uh, model par- uh, parameter model could be extremely powerful.
I still think there are limitations in terms of knowledge, so maybe the model will be able to do like-
Yeah, it's just-
... super wild agentic stuff, but it will not know like who was the president of X country 25... I mean, maybe, yes, but like very niche knowledge probably the model will not have.
Yeah. Um, it- there's just- this is just information theory, right?
Yeah.
Like you're using the model as a database.
Yeah.
So of, of course there's gonna be limits. The other thing is also I always think about, uh, when, when we talk about this topic, superposition, right? Anthropic has this whole concept of superposition where you can store information in the smaller weights as- because it compounds with the other weights as well.
Yeah.
And so, um, not that much research on it since then, but, uh, maybe this is my segue into McEnturf, uh, uh-
Gemma Scope?
Gemma Scope20:09
Yeah, so last year in December, we released Gemma Scope. So Gemma Scope pretty much allows you to, uh, analyze the, the activations across different layers based on the tokens, uh, input.
Yeah, it's fantastic.
And yeah, the team released, I don't know if it was couple of terabytes, maybe even up to like one petabyte of, uh, data that we had to store because we did that for every single layer across all of the Gemma 3 models, so it's a very complete-
And Llama as well?
We did it just for Gemma 3.
Oh, okay. I-
Yeah
... I think Neuronpedia had some others.
Could be. Could be.
There's a few other teams, uh-
I was like, wow
... Illya was doing.
Yeah, it was, it was very, very cool, the cross lab, uh-
Yeah
... partnership.
Yeah, yeah. There are a couple of open source tools there as well that you can just do- create your own, uh, yeah, your own activation, uh, uh, networks. Uh, yeah.
Yeah, yeah.
It's a niche field. I think it's a, it's a good opportunity. I think we were talking about this earlier, right? Like it's an area where you don't need lots of compute to get started. That allows you to understand like how the model works. You can experiment. You can get a bit of a sense of how, yeah, how transformer architectures work.
Yeah.
So it's a good area.
Okay. The context of this is really like why bring researchers to AI engineer, which is an engineering applied AI conference. Uh, one, to me, uh, it is actually very important that you bring the researchers because engineers want to learn about how the models w- that they use were trained, even if they never, ever trained it themselves.
Mm-hmm.
Right? Because I think they, they just feel more trusting of the model-
Yeah
... if they, if you peel back the curtains a little bit. And also, uh, I think there's some prestige, that people want to feel like they can go home and talk about it intelligently, even if they- ... they don't actually, you know, know how to train it. The other thing is like, I, I do think that research and engineering are closer than people think.
Yeah. Totally.
Uh, there's, I mean, there's research engineers.
Yeah.
And McEnturf is probably the easiest, single easiest way that engineers can get into research if they want to.
Yeah, I think in, I mean, in big part, like so many researchers are doing ablations, right? Like they are just-
Yeah
... moving the pieces around and seeing what works and what doesn't work. Uh, of course, there's like a branch within research that is more- much more like architectural design and like, um, much deeper, but there's lots of very like empirical experimentation and seeing what works, what doesn't work, uh, moving things around, uh, which for me is, is mos- more engineering, uh, rather than-
It is, yeah
... for like research unless we are like creating new activation functions maybe. But-
Yeah. I think this maybe is a change in your career as well. Like it used to be a, like a joke like, haha, our researchers are terrible at coding. And then they throw it across the wall to some engineer that will, that will clean up the code.
Yeah.
But now everyone has their own personal research engineer, right?
Yeah. And something that is cool that is happening is also how researchers begin to adopt some of the cool agentic tools now. So for example-
Yeah
... within the team we are building skills to do experiments and ablations and evaluations, and how the research team can use all of these agentic tools as part of their research process is also quite interesting.
Yeah, yeah. I had Yitae, uh, on my podcast who led the post-training for IMO, the IMO Gold's, uh, model. I think it was Deep Think. Um, and he was notably, he was an AI researcher that doesn't use AI-
Yeah
... until this year.
It's, it's gone even further. People making novel math research, like some of the Erdos problems-
Yeah, yeah, yeah
... they are engineers, not researchers, with no background in math just, you know, using coding agent-
Mong- mong the math, guys.
Yeah.
I mean-
Just not math, not research, but you know, solving some of the most, you know-
Unsolved problems
... unsolved problems, yeah.
Yeah. But e- even in the model architecture side of things, like two years ago when all of these people started to fine-tune models and to do experiments and do model merging, there was quite a bit of research that was happening in GitHub and in Reddit and in local Llama, and people were actually like inventing new things, and then there were papers-
Yeah, Franken merges, uh, yarn.
Yeah, like all of the Frank- Franken MoE, uh, stuff, like all of the Axolotl library, like all of these tools, and there were papers published by different companies and research labs one or two years later that were rediscovering what was already done by the Reddit or Discord people without, yeah, anyone noticing.
Auto-Research23:59
Yeah, yeah. Do you have a take on auto-research? Every AI wave has a auto ML wave.
Yeah.
And this is the auto ML wave of this wave.
Always been a bit skeptic. I mean, auto ML few years ago was mostly like just a-
Parameter search
... search. Yeah, yeah. Pretty much like research in, in this higher, yeah, parameter space. Uh, I don't know, like with Carpathia experiments it's been quite interesting to see- ... like, uh, how things are evolving. I w- I don't know what's your take on this.
Things are just cooler when he does it. Uh, I do think some, some part of this is you're just speed running experiments agentically, right? The agent-
Yeah
... the coding agent is more autonomous. You can actually go to sleep. And it will do the things that you would've done anyway, so you're just kind of automating things that you would've done.
I see it differently.
Yeah.
I think, uh, okay, it will be a very exciting time if we have a move 37 from an auto-research.
Yeah, yeah.
If you make an impactful discovery that someone wouldn't have thought of, right? So there's the side of, okay, I have these ideas, go run them in the background, that's fine.
Yeah.
But the, the interesting side is actually when you're shooting off not just paths that you wouldn't have thought about, but you know, trajectories that people wouldn't think about and they work and you make new discoveries.
Yeah.
That, that's the very exciting thing. I think when you have more approach to just token spend and send off, you know, hopefully that becomes possible.
Yeah, I do think the next generation of fine tuners will not be l- I mean, will be people that are not coding at all, right? Like, uh, one year ago we had to write, like, our own code, uh, with transformers or Unsloth or, or whichever library of your choice. I do think as we, like, keep evolving, like most people will be fine-tuning with a couple skills, right?
Like Hugging Face has the skills, like all of these libraries have skills. They will just, uh, prompt their agent to kick off like some experiments and see what works, what doesn't work. And we-
And honestly, it's a, it's a good middle ground right now. Like all the tools you've mentioned, they let you fine-tune in minutes.
Yeah.
You don't need to know what's happening under the hood at all.
Yeah. So I think that's where, like the direction will be, like people that just want to do fine-tunes to improve the capabilities for certain domain or like add some like new behavior, they will not be coding the fine-tuning code. But of course, if you want to do like deeper research in the architecture, my hunch is that most likely, uh, this will not be like a automatable, at least in the next one or two years.
Okay. We gotta wrap up soon.
Team Expansion26:06
Yep.
Uh, I just wanted to end a little bit on your, your, the growth in your team. Um, and you know, Paige is here, uh, Logan is, is o- over in SF, uh, and you've been hiring all my friends. Uh, Thor and Ivan-
Yeah
... and all these. Um, what does the team look like? Where are you looking to grow?
It's been quite exciting. We are hiring lots of h- very high agency people. Yeah, I think maybe three, four years ago we, we did a, like a nice interview about how, uh, I was growing like, uh, DevX, uh, at Hugging Face and-
Yes
... how we were thinking like DevRel should look like. DevRel, I think mainly is also interesting is redefining what DevRel should be in an AI, very AI-centric organization at the frontier. It's our research lab at the end of the day, and we are in this AI era. So it's also rethinking what DevRel should look like, uh, in 2026.
We are, yeah, we're hiring pretty much like high agency people, excited, uh, to, to build things, to engage with the community and so on. Right now we are growing in Singapore, so we are looking to hire someone in Singapore and also-
Coming to AI Singapore.
They're coming.
Yeah. And to hire someone in India. So those are like two locations which-
Why is Singapore so important?
So Singapore is interesting. Singapore has a relatively small but very high, like very dense, uh, high talent community. Now we have a proper DeepMind office as well. Like it's small, but, uh-
Right
... it's growing quite quickly as well. Uh-
Mainly because Yitae doesn't want to move.
Yeah, yeah. So-
But it's a huge win for Singapore. We don't have research in Singapore usually.
Yeah. So we're trying to grow the team in places co-located to like people doing like traditional DeepMind-y research-y activities. We don't want like, uh, to have like-
Sales office
... people that is in a single town that is not connected to anyone in person from, from DeepMind. So ideally if they go to the office, they can talk with researchers doing like their own... Even if it's a different project, they can be part of like the more DeepMind-y side of things. So, so yeah. So we have-
This is good
... like people in, in Paris, in London, in, I mean, Zurich, in SF, New York, so all of these, uh, DeepMind hubs, and Singapore now is becoming like a very small but very exciting hub as well.
Good.
This is all the DevRel, DevX team in DeepMind, right?
Yeah.
DeepMind has also expanded a lot here. Like-
Yeah
... a few weeks ago, Kaggle joined DeepMind.
Yeah.
How's, how's the org in general shape?
DeepMind in the past didn't do that much product and yeah, now we have like a AI Studio-
Even DevRel
... the Gemini API, now Kaggle. But Kaggle is, is part of the team. Actually Kaggle is also here.
Yes.
There are 50 members here talking about the-
Very excited.
Yeah. Talking about evaluations.
Yeah.
Uh, they, last week they released a, a new system for agent evaluation. It's like a very, like experimental initial benchmark, but pretty much allowing agents to take an exam and compete in a leaderboard, which is always fun. Yeah. When, with, with Kaggle joining us, I think there are a couple of exciting things. There's a whole Kaggle community hackathon things that enables the community to build hackathons, but there is also the Kaggle benchmarks, and I think Kaggle benchmarks can connect very well with how we think about Gemini and the capabilities.
And if you're in the eval space, like, you know, like many benchmarks can be benchmarked. Uh, many people are gaming the benchmarks, and we want to identify like which are these capabilities that maybe we are not aware that we have or that maybe we could improve and bring all of that feedback from the com- benchmarks created by the community in an organic way and bring the, all of that feedback back to, to the model itself.
Yeah.
I mean, the way we are doing Gemma, Gemini, and all of our tools is really like based on the feedback from the startups, the community, the developers. So that's why you see like Logan, Paige, everyone in the team talking with the community in social media, in forums, in events, and really understanding what people are building, uh, with our tools and bringing all of that feedback to, to the modeling teams, uh, which is very cool as being part of DeepMind.
Yeah. Yeah. Well, you guys are doing amazing work. Thank you so much for joining us here and, uh-
Yeah. Thank you
... can't wait to see what's next.
Yeah. Thank you for having us here.
Yeah.
