Intro0:00
But it's for the first time now, I think verified AI is to open up collaboration, either it's human-AI collaboration. Well, before Blueprint, like that's human-human collaboration, and Lean was the grounding, was the verification, formal language. And then human-AI collaboration like we're seeing now, future AI agent, agent-agent like collaboration. Like, I think verified AI is for openness. It's not for meeting the requirements of closed industries. And I think just like I think verification should not be about, "Oh, I remember," like, you know, there was this article like chatbots mixed up of as math solution hallucination. Uh, verification to me is not about lousiness. Verification to me is about scaling brilliance, compounding brilliance. It's like just kind of going back to the collaboration point. It's about Ramanujan being a much stronger mathematician. He was already a really strong one, but verification helps him extend the brilliance, like both kind of like scale up and scale out.
Welcome to the Latent Space AI for Science podcast. I'm Brandon Anderson. I build, uh, RNA therapeutics at Atomic AI, and I'm joined by RJ Haneke, uh, the CTO of Mira Omics, uh, working on spatial transcriptomics. It's a pleasure to have Carina Hong, fr- CEO and founder of Axiom Math. Um, Axiom has made a splash in several different areas. First, they were-- they got a perfect score on the Putnam, uh, last, uh, December, I think. They also had the claim of the first AI to prove research conjectures using formal verification. And, um, very exciting, they just yesterday announced a, a quite large, uh, Series A. Um, yeah, welcome to the show.
Series A0:52
Thank you for having me.
You just raised two hundred million dollars, which, as one of your colleagues said, this is like basically the entire, like, US math budget for math research each year.
Is that true, actually?
Uh, according to his LinkedIn post, yeah.
Okay, wow.
Uh, two hundred and fifty million is our apparently annual, uh, math budget.
I think we should spend more on math research.
Yes.
Yeah.
It's, it's kinda sad, like-
Yeah, I know.
But anyway, like, you know, as a, you know, as a nerd who loves math, that's like really cool. But I mean, I'm just like, that kind of blew my mind, like, what? When I read that, like, okay, so like, yeah, how is it two hundred, two hundred million at, I guess, one point six billion valuation? Yeah, I don't know.
Yeah, um, well, super, super excited to be here. Um, also, I think like, you know, uh, this is a Series A, so it's very, very interesting, timely, timely podcast. Uh, we're like a s-seven, eight-month-old company-
Mm-hmm
...so it definitely means a lot to us. It's a really cool milestone. Um, we're currently about like thirty people now, right?
Mm-hmm.
So kind of going into, I think this amount of funding will, like, give us the fuel that we-- it needs to, to, to accelerate-
Yeah. Mm-hmm
...um, the strong execution momentum that we have so far. Um, I think like people think of us, like there are many kind of ways to think about Axiom. People think of us, us as a math startup, so-
Mm-hmm
...math startup, lean startup. The other obviously things that we do, um, that are formal verification, we think verification is a really good best first market for math.
Mm-hmm.
And so I think this fundraise is gonna like let us explore some of the applied domains, uh, as s-my, my colleague, CTO Subho said in the, in the little launch video, um, with the Series A we had, is it h- it lets us broaden our dreams. So-
Mm-hmm
...yeah.
But still, like two hundred million dollars and I guess a one point six billion valuation, how is there a market for that? I mean-
Yeah
...I mean, like, obviously you're not doing this just for the fun of proving things-
Yeah
...although I'm sure there's a lot of that, but.
So let's, let's bring us back to twenty twenty-four. So when, you know, o1 reasoning models like just came out. What is-- what was Anthropic kind of like secretly working on back then? It was coding.
Mm-hmm.
And everyone knows they are working on coding.
Mm-hmm.
Like OpenAI, Meta, Axiom, everyone has full knowledge that Anthropic was working on coding. And they just like overlooked it. They thought, "Oh, they are a B2B play, so they just want one vertical." People think of coding as one vertical, and now look at where we are today. Coding, kind of like strong transfer learning from coding to reasoning to basically, you know, a monopoly in the, in the future of reasoning, and I think that's, that's really, really shocking. The people who were working on coding, I think back then believed in something that we believe, you know, similarly with math and Lean now, which is that if you have more structured and formal data, it's going to be a lot more horizontal than the specific vertical we are tackling. So you know, if today we are doing, you know, math informal way, like the standard chain of thought data, train a math model based on human preference, then I would say, well, perhaps we are just a math startup, right?
Mm-hmm.
But you know, while we are pursuing math, we are also doing things that do have transfer learning to other, to other domains. Um, so I think that's kind of like the broader, broader picture, is that while the DNA of the company remains math and all of us are math nerds, and this is a very strong cultural statement, everyone has a great mission of having AI be a superhuman mathematician like we are seeing on Putnam on a batch of research conjectures. In fact, we have another batch coming. Um, we are also thinking that this is going to be fundamental to verified reasoning, and we kind of talk a little bit about verified AI. I want to talk a little bit about verified AI next, 'cause I think you have another-
Verified AI4:52
Yeah. Yeah.
Yeah.
I have, um, several things I want, like, uh, so I wanna hear about the verified AI. I do wanna dig in a little bit. So do we know that, you know, Anthropic and OpenAI and everyone there not doing formal verification and using that for their rollouts and whatever?
I think I have a lot of like rumor mill that I probably shouldn't like put it on the record. Um, like- ...I think, you know, like researchers talk, they play card games.
Yes. Yes.
Yeah. But there, there are really interesting reasons if they are or are not doing it. I think that's like kind of the takeaway I have, which is that if you're like at a frontier lab-
Yeah
...and the direction actually does change a lot for a lot of the reasons beyond your control So I want to kind of like bring us back to the AlphaProof moment, right? Like AlphaProof was such an amazing... That really, the 2024, um, 28 out of 42 performance was the IMO moment for me. It was not Gode in 2025, because across 2024 and 2025, AI models could solve all the problems that are not combinatorics. The only difference is that, you know, if you get all the problems that are not combinatorics, you get 28 in 2024 and 35 in 2025, because there's only one combinatorics question in 2025. Um, after AlphaProof, kind of like we didn't see a lot of the formal math, uh, you know, results or kind of progress from Google DeepMind, and that's actually because of reasons that are not necessarily technical. But if you're at a startup and you have very singular focus that is formal math and verified AI, then, um, you know, you get to work on a really cool problem for a long time, and you have like a lot, a lot higher likelihood to get to where you want to be in terms of like progress and breakthrough unlock.
So yeah, just define that for us.
Yeah. Like a lot of people think about formal verification as an ancient, you know, subject. Um-
Yes.
... it, it, it existed like as long as, you know, way before like, like deep learning, and it existed in the time of rule-based computer science. Uh, there's this really strong push of like formal verification around like since, ever since 1980s. Uh, really interesting historic anecdotes such as, uh, I think the Paris Trade Union demanded that the automatic switching of the subway system needs to be formally verified for safety purpose.
Mm-hmm. Yeah.
So quite interesting trade union for, for technology.
Yeah.
Um, and like I think around the time of Challenger, both before and after, European Space Agency was using formal verification for the Ariane spacecraft. Uh, it's also interesting. Boeing, Airbus, formal verification. And then more recent years, right, I, I think like there's a lot of push about automated reasoning at AWS because they have a lot of enterprise customers that really require things to be, to be 100%, you know, verified and there's no edge cases mi- uh, like missed and like just general like testing doesn't satisfy the need. So a lot of people think about verification as something that's like annoying because it's like tax and compliance. Like it, it's-
Yeah.
... making sure that we are good to go, right? And like that's really not the... And, and so we, we talked about like verification. I think our competitor when they launched, they talked about, um, formal verification pre-reasoning. They talked about it, uh, in the time of hallucination. And, and maybe for them, like formal verification is about the lousiness, the, the hallucination. For us, no. Like for us, verified AI is about the brilliance. It's about scaling and compounding super intelligence. So this is quite a deep point, and sometimes it takes a little bit of explanation. So if you think about like, you know, the, the place of brilliance, for example, Ramanujan, like he's a brilliant mathematician. He was able to find a lot of like interesting formulas just by intuition before he know how to do proofs. So he went to Cambridge, um, you know, work with Hardy and Littlewood and, you know, in the famous movie, The Man Who Knew Infinity, there's this like storyline of how hard it was for Hardy to force him to no longer rely on intuitions and do proofs. After he learned proof writing, he came out as a much more powerful mathematician whose results, um, like intuitions, turn into theorems, and future generations of mathematicians build on that, those theorems. So it is a way to kind of scale and compound, uh, the intelligence that we already have. Another example, mathematicians kind of have been writing code in English
or their respective country's natural language for thousands of years. And why do I call it writing code? Um, because there's this sort of community standard of rigorous logical deduction. Everything has to be step-by-step correct, um, otherwise you will get outcasted by your math community. Uh, like look at some-
So it's the law.
Uh, well...
Or rules in, in the community.
So, so it's interesting, right? Because that is kind of human mathematician enforced.
Mm-hmm.
Right? And so, uh, it's a peer review process. Peer review ta- of a paper currently takes two years. Okay, so but proof assistant and, you know, formal proof trackers like Lean still found its place, right? And why? Like if you, if I'm a mathematician and, you know, any, like my work can be peer reviewed by other humans, like why do we even... why do mathematicians even play with Lean, right? And why do we even like talk about kind of like, you know, Lean-based, um, assisted like theorem proving? It's, it's because like it handles a low level. For example, we're not even talking about AI. We're talking about, for example, the grind tactic in Lean. It can currently handle a lot of math proofs, like at a very low level. And, and this is pretty shocking because I have seen, you know, actually another, uh, a company working in the same space, like, you know, some of their demo, and I look at the demo, like it can actually completely be handled by grind, uh, which is a tactic in Lean.
Can you explain what Lean is to non-experts?
Oh, okay.
Yeah.
Yeah, yeah. I think our order is like a little, little wrong. Yeah.
Yeah.
So Lean is a computer program-
Uh-huh
... uh, a bit like for math proofs.
Yeah.
It is a formal language-
Mm-hmm
... uh, just like its cousin Isabelle, Coq, uh, or Roc-
Mm-hmm
... um, and, um, uh, some other further co-cousins like Daphne, Agda, like these formal languages. Whole sector. Yeah.
And what, what does it do?
It, it basically, if you have a proof written in the program in Lean, and then, uh, assuming there's not, no, any like weird things happening, so like, you know, unintended, like, you know, use of sorry, which is a tactic that let you take things for granted, assuming everything is, is safe, um, hence pe- people have tools like comparators, safe verify, and Axiom recently rolled out verified proof that's like a hundred times faster than comparator. Um, then, you know, once you kind of execute that program, it com-- like once it compiles and it tells you that it's correct, then it is, the proof is actually correct.
So it's like a type checker?
Yeah. That is based on this result called, um, Curry Howard correspondence, which turns proofs into programs. So, so I want to talk about the magic of Lean, why I think it's a really good programming language. It's because on one hand, if you don't care about the formal part at all, if you don't care about the logic part, you just want to use Lean to write code, you can. Like, we have had candidates, actually currently, um, you know, the person is working at the Lean FRO, um, he wrote Autograd in Lean in our interview process.
So, so it's a, it's a Turing-complete language?
That's right. So, um, you can write-- you can do a lot of things with Lean. You can... It's a, it's a functional programming language.
Okay.
Right? And then you can, you can also use it to... So you use it to do coding, you can use it to do math, two-in-one.
Okay.
And kind of going back to what I was kind of getting at, if mathematicians are already enforcing that most proofs, you know, say, say maybe not all mathematicians, but, but the ivory tower people in academia, all proofs are correct, why do we even need Lean, the model checker? It's because Lean has tactics that help them handle the low-level calculation or proof or deduction, not calculation, um, then for them to be able to navigate in a high-level intuition space. So this is my point, that it is not about, like, formal verification or verified AI. To us, it's not just about handling or, like, kicking out the lousiness, the hallucinations, the mistakes. It's about scaling brilliance. It's about super intelligence.
I actually... Terence Tao has a great video also about using Lean to, as a way you can collaborate-
That's right
... because you can use the sorries.
Exactly. Exactly. That's another point I want to-
Yeah
... I want to talk about, right?
Yeah.
A lot of people think about, you know, what is our, what is our market? It has to be some, like, really niche industrial societies area that is mission-critical, safety-critical. No, that's not the TAM. The TAM is all code.
Mm.
The TAM is a f- is a right of first refusal on all AI-generated code. Like, for right of first refusal, meaning, you know, you get to choose whether you want to verify it. So this is the important part I want to kind of come across, which is that people talk about formal verification as almost, like, painful because it has all these, like, stringent requirements, uh, you-
Axiom System13:42
Up until now, it has been very formal, right ?
Well, yes. Yes. And, and to us, it's actually verify generation means performance gain. It means higher sample efficiency. It means a startup like us with like, you know, still we, we raise some money, but lesser compute budget, lesser data budget than Frontier Lab will be able to match and even exceed, you know, performance on superhuman tasks. In fact, for the Punam exam that we just competed December twenty twenty-five-
Mm-hmm
... which we did in real time, Math Arena, which is this organization that evaluates a lot of LLMs, found the best LLM, DeepSeek, got one hundred and three point out of a one hundred and twenty-point exam. The best human, obviously we now know, is a student from either MIT or Chicago, we don't know which one, um, because they don't announce the top five winners' score, got one- one ten, and we got one twenty. So it's the first time, actually, I remember when we were starting this, people were like, "Is it even possible that a formal math, you know, system with so much orders of magnitude less data can, can match or beat a, a for- an informal LLM?" And Punam is the first time it beat, right? And, and so we're not thinking about it just about the painfulness, the challenges it pose. We are thinking about the verified generation performance gain, the improvement, the, the fact that you can, you know, just like you would expect RL for Lean to, to have improvement because of seeing evidence of RL in coding. So this is the second point I want to make about, like, how to think about verification, verified AI.
So maybe we can talk a little bit about why. Can you describe what, what is different about what you do versus what the Frontier Labs, y- you know, at least when they're building-
Right
... their standard-
Right
... RL-
Right
... enhanced, uh, LLMs, what, what's different about what you do?
Yeah. So we heavily rely on a kind of data called Lean data, and we kind of talked about Lean as all the, all the data, um, that we have that Lean proves, you know, it, it's correct. So you, you know it's correct or not, and that's quite, quite important.
Mm.
So, you know, we have a system of models. These models are post-trained, and, um, using RL or FFT-
So LLMs found, like some sort of foundation model that you get off the shelf, and you post-train it, or, or con- continuous-
Yeah
... pre-training on Lean?
And there's obviously an inclination for open source, you know-
Yeah
... based models like-
So it does speak English-
Yeah
... probably knows how to code-
Yeah
... but it also, you fine-tune it or, or continue-
Yeah
... pre- pre-training.
Yeah. And the base model might be similar to what everyone else-
Yeah
... is using as well.
Right.
Right. Um, if they're not kind of pre-training their model.
Right.
Yeah.
Mm-hmm.
And then we basically do this, you know, RL for, for formal math kind of. There's, I think, a standard pipeline or, like, you know, tricks of the trade that-
Mm
... people use. We try to innovate really, uh, on top of it as much as we can. I think that we found, um, scaling inference to have almost no wall, um, recursively decomposing, uh, you know, a proof goal into many subgoals, and then learning to backtrack as well.
Is there a risk that, like, you start out with this, you know, what you know in a certain domain of data sets and so on, and then you start rolling out, you know, recursively in a space, but now all of your training data is localized in some domain that you... It still is only so, like, maybe logarithmically, um, in some large space growing from your initial training data. So you could get trapped essentially in that, you know, you could be really good at this, but you just created a big jagged frontier where some other domains are just far from them.
Distribution shift-
Yeah
... are we talking about?
Yeah.
So yeah, so, so, you know, it is an open question-
Mm-hmm
... whether a, uh, a system that can do really well in number theory can do well in, uh, give me, you know, another, another field-
Topology
... of math. Yeah, exactly. Well, actually, I think this, the way we think about it is it, it depends. It depends on whether topology has a lot of the, um, existing definitions as almost like, you know, the, the math infrastructure-
Mm-hmm
... um, existing. Because what people have found in the past is when people were building out math lib, like, you know, for the algebra, um, you know- Book work, um, like they, they can just-
So Mathlib being the Lean-like undergraduate library-
That's right. That's right
... kind of thing.
Yeah.
So it's like all the proofs that you learn-
Yeah
... in undergraduate math-
Yeah
... and they're all sort of in Lean.
Yeah, yeah. So for example, some of my friends, um, who currently are at Axiom, it's, you know, crazy, like full circle back moment. Um, Kenny, uh, we're like friends for like, you know, five, six years, and he was the first one to tell me about Lean. Uh, he was working with Kevin Buzzard to build out Mathlib. It's a lot easier to codify algebra in Mathlib than, than for, for analysis.
Mm.
Uh, so, so that's, that's interesting because for analysis, a lot of the definitions around convergence limits, et cetera, becomes tricky. And so I don't think there's a lot of like topology in Mathlib today, um, in terms of like differential topology, differential geometry kind of stuff. So, you know, our system likely will not do very well on those, on those domains because it doesn't even have definitions to build off on top of. For the places where the, um, definitions are, are in, we actually are doing quite okay in terms of distribution diver- diversity. We have good performance, you know, sol- having solved open research questions in, um, number theory, commutative algebra, algebraic geometry, some discrete math like combinatorics and probability.
So earlier you said that like with the Putnam exam, the, the 2024 version, when all of the questions were... that were not, that AlphaProof did not get right-
The IMO, the International Mathematical-
Sorry, IM- yes. For the IMO, all of the ones that got wrong were in combinatorics. Is there, is there like a weakness there in that specific domain?
I would say so. For, for Olympia in math-
Mm-hmm
... people are seeing combinatorics being a little bit more, um, tricky.
Mm-hmm.
Uh, seems like the steps are quite creative. So I, I, for a... I'm a human and, you know-
Mm
... when I have friends who are really good at combinatorics, which I never consider myself really the, the top of combinatorics. I'm-
Mm-hmm
... kind of better in number theory, but I know some people who are just, they're IMO gold, perfect score, Putnam Fellow-
Mm-hmm
... perfect score, and like all the way. And then when they do like tricks in combinatorics, I'm like-
Mm-hmm
... "I don't know how you thought of that," and but, you know, after you give me that construction, it actually-
Mm-hmm
... becomes a lot more trackable. I think a Lean-based system will struggle in those very creative, um, places, which is why we at Axiom actually also invest on something called mathematical discovery. It does not use Lean, and we have some major news in the coming weeks, um, basically open sourcing entire code bases of mathematical discovery coming up.
You wanna tell us a little bit?
Yeah, yeah, sure.
Yeah.
Um, so, uh, we are currently, uh, having two code bases, um, being open source. The, the goal is for if you're a mathematician or you're a theoretical physicist, and you have a problem that you would like to solve, for example, you want to, uh, find a construction that is a very complicated graph construction, then we would suggest you follow the very detailed manual supposed intended for mathematicians to run the code that we write. Uh, it, it's a, it's a tool for, for mathematicians to make mathematical discoveries. Mathematical discoveries is this idea that, you know, proof is not enough for math. Uh, in fact, before you kind of start proving something, you don't know where you want to start, so you will try to construct some interesting examples. This can be usually, say, sequences, right? If you want to understand the property of a sequence, you will write out a few of the first terms. This can also be graph. So if you want to, you know, figure out what the graph that you're looking for, um, should have, say, a certain property, uh, then you will start by doing some simpler version of the graph. Now, constructions cannot be done by Lean, so we believe in having AI for math discovery, and we have, you know, one of the OGs in that field, François Charton, um, member of technical staff at, at Axiom, and he previously have done pattern boost and end-to-end, you know, set
out disprove a 30-year-old conjecture by finding a counterexample, um, found the solution to a 130-year-old problem, the global Lebanon function, that is a kind of mathematical object showing up in the three-body problem. So we, we are, we are thinking that, you know, it, it's... mathematical discovery tools should be open to the math community, so we are open sourcing entire code bases for that.
Discovery22:12
So d- discovery meaning it gives... it makes new conjectures, or it-
That's a-
Or-
Yeah, it's a pre-conjecturing step, actually.
Okay. Oh, I see.
Yeah. So you, you start to form intuitions.
Oh. Mm-hmm.
Right? If you're a mathematician and your goal is to solve a really hard, hard conjecture, Axiom Prover can't just solve it for you. Um, you might want to try to formulate some sort of lemmas, conjectures that you want to, say, then give to Axiom Prover. Um, if you're a human mathematician, you will start by wanting to formulate that conjecture. You don't know where to go. You want to find constructions. Now, uh, the code bases that we're gonna open source gonna help you, uh, hopefully significantly.
So one thing that maybe there's a lot of computer scientists listening, and one of the things that will immediately kinda come up in w- especially when you're talking about formal verification and so forth, is Rice's theorem and decidability and incompleteness theorem and, and maybe, um, some arguments about computational complexity and LLMs. So I, I'm curious to hear, Rice's theorem says you cannot prove non-trivial things about programs for all programs, right? So how, how are you navigating this space? Obviously, formal verification, you know, does-- is able to do some things.
Yeah.
Yeah.
So yeah, I think like it's, it's very clear that you just, like there's theoretical result telling you you cannot formally verify all programs, right?
Yeah.
But you, you... I think it, it's good to f- formally verify a majority of the useful programs, right? So, you know, like I remember, uh, there is this MIT, uh, like little, like- Documentary, or not a documentary, like an advertisement for, you know, uh, people who are admitted students. And then there's this famous line by Tim, the, the beaver, the mascot of MIT, saying that, "What does theory give you?" Which is, which is kind of like, it doesn't stop us from-
Yeah
... trying to push it as much as possible.
Mm, yeah.
So the goal that we have for the future is suppose you are, you know, doing, doing the, the cr- coding. You want to vibe code a really complex task. So, you know, currently it's front-end websites, but in the future, we might want to vibe code much more complicated things, whole distributed systems even. Then we want to be able to, say, decompose it, and there's maybe a high level, kind of like sketch plan this we can make, other people can make. But say, you know, you have Claude give you like, you know, kind of break it down into 10 things, and at one point it will decide to call Axiom, and Axiom will give you a computer program that you know is formally verified, or it will say, "This is still too hard for us."
Code Limits25:12
So you, you write the program?
Yeah.
You give it to Axiom?
Yeah.
It makes changes to it maybe?
So, so we're talking about kind of two, um, sort of phases.
Mm-hmm.
Um, it is possible that we are the verification partner.
Yeah.
So you already have a computer program-
Okay, yeah
... and you want us to verify it.
Mm-hmm.
In fact, like, you know, GPT found a proof to an unsolved Erdős problem, and our competitor, Harmonic, you know, Aristotle, um, you know, verified it. But we, we can do, we want to do-
Both of these
... verify generation, right?
Yeah.
We might want to say, "Hey, you know, this little component, everything that we generate and provide for you, um, is, is formally verified."
I see. So, so the idea would be you, you generate, you co-gen- generate bo- both, and so that-- And I can imagine this fitting into, um, you know, the idea of a promise or a sorry, sorry, and then a sorry. Which is- ... lean.
A lean sorry.
A lean sorry, meaning it's a lemma that is unproven, but you're just taking it as given until you can take, have the time to prove it, right? Is that a good way to think about a sorry?
That is a good way to think about a sorry, but not necessarily in the coding context.
It's, so, i- I can imagine you're, you can say, "I- assuming that this module is verified, then-"
Okay.
"... this module is correct." And, and so that, that you can decompose a problem small enough that you can verify. Is this kind of the intuition here?
So, so, so let's say we want to, you know, like, uh, vibe code control flows.
Yeah.
Right?
Uh-huh.
That's quite hard. You will likely, you know, break that down into multiple steps.
Mm-hmm.
And then it will continue to break down these steps into more fine grain steps.
Yeah.
And at one point, you want something that is absolutely correct.
Yeah.
And then this is also something that is likely within reach. Then we want to generate, you know, both, uh, we want to generate a piece of computer program, and underlying is a guarantee that there is also the, uh, proof that has been generated, which tells you that the thing that you specified, this, you know, program can solve for you.
Yeah.
So, so, so the vision we have is anything that can be-- Which anything is, you know, um, it's a little bit marketing because, because as you said, this theoretical bound. But, but mostly, um, w- well, um, almost surely, hopefully, um, anything that can be defined can be executed. Anything that can be specified can be proven. So the way I think about it is if you have a, a program, um, times a, a, you know, a program times a, times a statement or problem, it maps to verifiability conditions times a proof. So-
Yeah
... while the programming veri- program verification community has given you, say, the verifiability conditions, and we're trying to kind of recruit a really strong team to help us do that, Axiom Prover is gonna give you the proof.
So just help me map from the program to the proof. Because like-
Yeah
... I could say, you know, this two-line lean program-
Yeah
... verifies, you know, sort of like whatever, whatever I claim it solves-
Yeah
... how do I know that it actually-
Yeah
... verifies the thing that I think it verifies?
Yeah. So, so for example, there's this, currently there's this benchmark called Code Marina. It's a, a code verification benchmark, um, that's supposed to be lean-friendly. And so, you know, every problem is a coding problem, and the goal is to generate, there's a code part and there's a proof part.
Yeah.
Two, two different computer programs.
Yeah.
And then the goal is to generate code with proof. So, you know, the code that supposedly solved this problem, and then the proof that this program indeed does solve the problem.
I see. Yeah.
W- Now, now, how do people do on this benchmark? Uh, I kind of want to, like, talk about this a little bit 'cause it's interesting. It was, um, rolled out, I think, by, um, Berkeley and Meta researchers in twenty twenty-five, and they found, I think, whatever version of GPT they evaluated does, like, pass one, like, three point six percent, iterative something like twenty-two percent. Now, you know, how does the formal math systems models do? Um, Copra, which is a, a system, because in a system you iterate and define, so pass one doesn't quite work, but still they evaluate it. Pass one of the system about, like, I think eleven, twelve percent. And then also DeepSeek Prover and Godot Prover, very strong prover model, eleven, twelve percent. And I think our competitor has released last year, uh, on the only proof part, ninety-six percent. Um, and we actually recently, with no modification to the put in system, we saw a ninety-nine percent out of the hundred and eighty-nine problems. We solved a hundred and eighty-seven. We missed only two, um, code with proof. So if you, if you want to train something to do code with proof, and you want to do reinforcement learning, it's actually quite annoying because, look, it's, it's mixed. If you want proof to be informal math, it's, it's very annoying because then that's, like, just mixed objective function. Um, your code is something like Python. Your proof is, say, natural language
math proof. Um, you will not have a strong RL kind of- Performance, right? But if you have proof as lean and you have, you know, code, you can choose Rust, which is a strongly-typed language. It's, it's more, it's more conversion, so you're gonna have much better performance.
I can't wrap my head around how do I tie... So I, like, I can say that this proof solves Fermat's Last Theorem, right?
Yeah, yeah, yeah.
But I don't know that, like-
Yeah, yeah
... uh, but it's two lines in lean.
Yeah.
Obviously, it doesn't. So how do I know that the program that I wrote matches the proof that I generated?
You will basically look at the coding problem, and you look at the, the program, and then you, um, like, try to see if it satisfies the verifiability conditions.
But, like, how do I know, right? Like, if I read it-
Right
... um, you know, like, I can, I can just, like, eyeball it, and I can say-
Yeah, yeah, yeah, yeah
... and then, like, g- traditionally, how mathematicians have done this is they, they, you know, they take the paper and they read it, and they say, "I agree that this proof solves the problem."
Yeah.
And then this other person says, "No, wait, it doesn't for, you know, like, look at this," and then people disagree, and eventually, there's consensus that, that, like, this proof solves this problem.
Yeah.
So, like, how do, how are you crossing-
But, but you check it step by step, right?
Yeah. Right. Right.
Yeah. Yeah. So you basically will look at the verifiability conditions and see if it does actually satisfy that.
The-
So, so suppose, suppose, like, we're looking at, like, you know, a piece of computer program.
Yeah.
Right? And then whether it does actually solve the coding problem, you will have a judgment-
Mm-hmm
... about that, right?
Yeah.
So you will not solely rely on testing, even though that is a way. That's why you-
Oh, so somebody looks at the proof-
Right
... and says, "Yeah, that actually solves the problem that we think it's supposed to solve."
Yeah. But then-
Okay
... but then now you're, you're basically producing a, you know, formal verification program-
Yeah
... that satisfies the verifiability conditions-
Yeah
... about this program and this statement. So again, the function is taking you from the program and the statement to verifiability conditions and proof.
Okay, so I can see how this works in a benchmark. Then w- if I have, let's say I have a, a flight control system, right?
Yeah
... that is, like, very complex-
Then the, the problem becomes very annoyingly, uh, you know, the, the, the, like, specification. I think the word is gonna, you know, even if we say successful, like, like, anything that, you know, that we will have a specification problem.
Yeah.
So, like, here comes a bank saying that, like, "Please, do I have a really safe financial audit?"
Yeah.
Sorry, like, "Prove the financial audit for me," right?
Yeah.
Like, what does that mean? Like, we, we can't specify. Hu- humans are bad at specifying everything that we want.
Right.
There's always, like, some sort of thing that we are not specified, and if it's not specified, it's not proven.
Okay, so what do you do about that?
Yeah, so we're not there yet.
Okay.
Currently, uh, you know, like, again, the, the, the vision as of currently is-
Yeah
... anything that can be specified can be proven.
Okay. Yeah.
Now, obviously, there are people who have been really good at... You know, that's where, maybe where that's is informal kind of reason they're coming.
Okay.
Right? The informal reasoner can... And, and this is, I want to kind of, you know, call the literature of testing. Like, testing are great because testing is like, "Hey, have you thought about that?" Right? Like, like, I want to highlight the work mutation-based, you know, LM unit test generation by, uh, ex-MCTO Shubho, and he was a, a director at Facebook AI Research. Like, the, the way you kind of think about it is, like, the AI will be like, "Hey, have you thought about, have you thought about this, this, this case?" Like, and so this is a little bit like conjecture.
Mm-hmm.
So the conjecture is going to help with the specification.
I see.
And then the prover does the proof.
And so this is an interactive process maybe that the-
Yeah
... person, so that when we're actually giving good specifications-
I think this is the future of coding. Yes
... yeah. Oh, interesting. Yeah.
Yes, I think this is the future of coding. And I think this is where, you know, this is where I think even if we are supposed, like, given the assumption that everything can be formally verified, you know, like, studying sort of like, you know, automatic task generation is still interesting because it, it is basically giving you the specification proposal.
Yeah.
Right? And then another thing is, let's talk about autoformalization, right? Which is the ability to, to define it. It is kind of conver- uh, converting something that is more, more informal, uh, into a, into something that is more formal, autoformalization. Um, so suppose I have a coding problem that is written for ICPC, and this problem is written in English, like Alice and Bob, blah, blah, blah. Okay, now I want to convert that into a formal statement, like a formal spec. How do I do the autoformalization step, right? Now, this is going to be h- because I have not solved the problem yet, so I don't have any signal, I don't have any grounding. The test cases, input/output pair, is gonna ground my-
Yeah
... formal spec.
So I know I have to know I'm going to give this input, I'm gonna give this output. It has to have these characteristics.
Yeah.
And so, and so I write test cases, and I write a... So is there a equivalent in lean of this, right? Where the specification, where you just know the sort of, like, outcomes that you are expecting, so that you, like, you, the statement of the, the result and then the, but the proof is completely unproven?
So lean is actually quite annoying because it's like, a lot of the times, it's proof.
Mm.
So you don't actually have the numerical answers to ground it.
Okay.
So autoformalization is quite, quite a hard thing to do.
Yeah.
Um, because, you know, what's generally happened is, um, you can't, you just, it's hard to ground the autoformalization of a statement. You can obviously ground the autoformalization of a proof, but because you can then just run it. But you need human to eyeball it.
How big is a lean proof of like, a formalized, you know, of a formalized program of significant size? Like, I mean-
So, so-
... do they grow with the size of the program, or do they grow super linearly?
Yeah. Currently, actually, you know, for each line of code written, there could be, like, 20 lines of proof
Okay
It's not looking that great.
But, but is that like a linear relationship, or is it as the complexity of the program gets greater than it, like it, you know, sort of also grows so that it's like-
I don't have a good answer-
... 40 pounds-
I don't have a good answer to the scaling law of that.
Oh, okay.
Yeah.
'Cause I know that that's a problem in formal verification, right?
That's right. That's right.
Right, where you have these huge pro- like, you have to have these very, very-
Yeah
... long proofs-
Yeah
... for even simple programs.
Yeah, yeah.
So then, then do you-- are you gonna run into sort of like limitations in, in the capabilities of LLMs when you start to get to large, larger, um-
What, what we believe fundamentally is we are building a reasoning engine.
Mm-hmm.
And we have seen Axiom Prover deal with really huge trees that are, like, you know, tr-tree of a proof.
Okay.
Um, we have seen it scale from forty nodes to four thousand nodes.
So wait, sorry. Axiom Prover is the, is the LLM?
Axiom Prover is a ensemble system-
Oh, ensemble system
... of multiple-
Okay, yeah
... models-
Yeah
... that we do post-training.
I see. Okay, yeah.
And also, it also includes, obviously, the tools that-
Yeah
... Axel that we have-
Yeah, okay
... um-
Right
... open released.
Sorry.
Yeah, no worries.
Clean it up.
Yeah. So, so we have seen it being able to deal with more and more complex task.
I see.
We don't think it's perfectly bound. You, you could ask, you know, is it bounded at one point on the pre-trained base model?
Yeah.
I think that's a good question. I think, you know, mid-training could be very interesting because it does actually, you know, a lot of the sort of capability gain does come from that part, right? If you could argue that even if you, uh, try to reinforcement learn some, uh, person who is not very talented, uh, that person might behave, you know, be, be, be-- perform a lot less well than an on, on, on post-trained Ramanujan. Uh, you can, you can, you can argue that very, very sad reality of things. But, um, so at one point, we might consider doing, doing that.
Context37:57
Then-
But we think there's so much to push on the-
So you, you just feel like there's so much overhead right now or, or so, so much, um-
Space, uh, to-
Space to grow-
Yeah
... that, that you're not running into theoretical-
Yeah
... constraints at this point.
Yeah.
I, I just wonder because, you know, there's been recent results in the computational complexity of the problems that LLMs can solve fundamentally.
Yeah.
And I don't think that they're really a concern for, y-you know, when I'm writing code with Claude Code.
Yeah.
But I can imagine problems becoming big enough in a system like this where you have a gazillion lines of lean.
Yeah.
You can't get 'em, get 'em into the context window, so you have to, like, be smart about that.
Yeah.
And then you have to summarize, and then you're summarizing and summarizing, and-
Yeah
... pretty soon you're, like, kinda losing track of what's going on. And I-- it just seems like with a lar-very large system like that, you might run into-
Yeah, I think this is, this is interesting. It's always a problem of abundance. So suppose you just, like, keep really the, the mathematical discovery. Renaissance has come. Axiom Prover does try to prove everything. You end up with, like, tens of thousand lines of lean proof. So first of all, is auto informalization is a lot easier than auto formalization minus a problem of no grounding, right? So, you know, every, every model has seen a lot of text and a lot of lean. So you can always, you know, convert that lean back into, back into informal, and then there's the problem of, well, how do you know if you're correct or not? You can rely on cyclic, like, consistency, so you then formalize it again and, like, prove, like, program equivalence, something like that. So that's-
Oh, so you, you like informalize and then formalize-
Yeah, you can use it to ground
... and you, like, make sure that you still look correct.
You can use it-- yeah, yeah. Like, and auto informalization-
Oh, interesting
... is, you know, obviously less harder problem.
Yes.
So you can always do that. So for a lot of the, you know, the, the lean code that we output, we can have an informal summarizer of, of like big chunks of lean. It's actually doing okay. So, you know, that's, that's a, that's a thing. And there's have another question of like, which I think is very interesting is I think there's a panel at, um, ICML Vancouver last year, um, at AI for Math workshop. There's like Leo De Moura and Java-Ja-Jer-Jeremy Aligod and, um, Shubo and CTO was there. And they were talking about like, will, will humans or mathematicians at some point stop trying to understand what's going on there, right? Because like-
Mm-hmm. Yeah
... suppose you're a really ambitious mathematician. You're like, "I want to prove my hypothesis." And bang, here's a lean proof. And like, it's actually correct. And it's just like, you know, problem, one million lines. Um-
But, yeah, isn't that like a big negative for the community? Because, I mean, usually when someone comes up with a big proof of something, um, oftentimes it will-
Yeah, I was about to get-
The process as it is massive.
Yeah, I was about to get there, right? It's like, well, will, will that negative outcome happen-
Yeah
... was the question the panel was discussing. It's a hy-completely hypothetical.
Mm-hmm.
No one's, no one's like, you know, model system can, can prove my hypothesis, right? So, so disclaimer, please, please don't cut that part . Just send that along.
Yeah.
Um, but like, you know, um, will, will people still trying to, try to understand what's going on? And I think the answer is usually, it's, it's always yes. I think curiosity and the, the desire to understand what is going on, you know, um, mathematically or in other domains as well, is a basic human need. And I think that is like, I think, a dose of optimism in an era of, I think, verified super intelligence, suppose we get there. Is that even, even if all the outputs are going to be produced and at a much, you know, faster pace and much more exponential volume compared to what humans could possibly consume, they're still going to try to consume it. And if they're still going to try to consume the ones that they deem important, so then basically attention is the bottleneck. And if attention is the bottleneck, then really intuition and taste, uh, you know, of which statement is probably worth con-worth the consumption of human and also maybe in a finite compute resource, worth, worth the consumption, uh, worth, worth the, the sort of spending of compute resources. That's where human mathematicians' taste will always guide us, and I think that's incredibly beautiful.
Is it worth, like, internally taking, like, results you can prove one way and then trying to send your system at many different routes to get, like, or like orthogonal, conceptually orthogonal proofs, and so you kind of get a diverse set of different ways of You reasoning about the same thing because, you know, I think it could be very valuable if you give it a problem to say, "Oh, well, like here's kind of the brute force natural way that like maybe some humans would do it." Um, and then the, uh, there's like a really much shorter elegant way of doing it. So have you essentially thought about training your models to be elegant in some way?
Yeah. At one point, we're gonna get to there because, you know, I think the conjecture, uh, will probably depend on what, what, you know, will probably de-de-depend on what we mean by taste, elegance. Feels like an alignment problem to me, you know? Like, you know, who, who gets to say what is elegant? Humans get to say what is elegant.
Yeah.
Right? That's-
That's, that's, I mean, fundamental, right?
Human preference.
And there's something about hard work, right?
Yeah.
That what-
Yeah
... what you work on hard is what you're gonna be good at.
Yeah, yeah.
Right?
And we're gonna have a problem about that, I think-
Yeah
... like pretty much in, in a lot of the domains as well-
Yeah
... right? Not just math. Like-
Yeah, absolutely
... how do you be that senior, um, programmer with, you know, really good high-level understanding, well, I guess full stack understanding, high level and low level-
Mm-hmm
... if you haven't spent the year of training?
I mean, I would argue that you don't... This is very phil-philosophical, but like y- you know, I, I don't need to be good at assembly language programming, right? Like no- not many people are good at that. A few people are because it's important for their job.
Business43:57
It's not experience, but curiosity.
Yeah. So but, but it feels to me a little different because not-
I see
... being good at like proving things, for example, right? That seems like a fundamental gap in like, uh, that maybe my mind doesn't develop in the same way if I am not doing that. Whereas if I'm just not good at assembly language program well, but I'm good at like higher level programming, so maybe that doesn't matter.
I think that's probably because how the, maybe how the education system, the pipeline works, which is that if you do not show early signs of brilliance, you don't sometimes go through the process of pre-training-
Mm
... in math.
Yeah, yeah.
Right? Like, so, so that maybe you can argue that you don't need to, say, you know, learn everything to develop a sense of taste, but there's like a threshold you kind of need to meet.
Yeah.
So for example, you probably need to be able to code even if you don't need to understand assembly-
Yes
... language, and that thing might transfer my intuition or, you know, my, my intuition might transfer from the Olympiad math problems into some other research areas, and I, I tried to pursue, and combinatorics transfer is more direct, um, s- very similar, and number theory could be further, but still okay. And then when it gets to like something that's a lot more different than Olympiad math, the transfer is less strong. But kind of like, you know, you need to diligent, as you said, right? Like you, you need to diligently go through some amount of training.
Yeah.
And if people over rely on strong AI, that doesn't happen.
I wanna s-switch gears.
Yeah.
You, you mentioned, uh, software verification. What are the domains? How are you gonna make, uh, enough money to v- justify the valuation? The like... And congratulations by the way.
Thank you. Yeah, yeah, yeah.
How... So what, what's the-- Give us the, the high level summary of like w-what is the, what is the vision that you show, you put in front of investors about why does this actually make a lot of money?
Yeah. So, um, first of all, this round is kind of preemptive, so it's, uh, I think a lot of the investors have pretty high interest about, about Axiom. Um, in terms of kind of what we believe in, we believe the future of coding is going to be somewhat constrained by verification capability.
Mm-hmm.
And we believe in solving formal math is a very natural starting point, and then by extension, you can increase the verification capability across hardware and software. And for hardware, for example, that's quite revolutionary. I mean, that is, there is no, as we know, there's no partial credit for a mostly verified GPU.
No.
Uh,
You can tell from that-
It's all or nothing. It is all or nothing. And you do, and you do need a perfect prover. Like, I want to stretch that, which is, which, stress this point, which is that suppose I am a, you know, I'm, uh, someone who loves solving math. I think there are a lot of Twitter users who enjoy Pokemon, like hunting, um, or those problems. And then I just try to, um, you know, use a, uh, a non-deterministic LLM, uh, like GPT, say, to try to get the full proof for that.
Yeah.
Now, I can do that many, many times, and I might succeed, and I might not, and I might not have a problem with whether I actually succeed or not. This absolutely does not work for hardware verification.
Mm.
So for those kind of domains, which I call like hardcore verification need, it, it is a pain point. It is a current pain point. There are, there are, there are hundreds of humans and thousands of licenses being dedicated to solve one local grid problem verification.
Just as an aside, the, my understanding is that the industry standard for design to verification-
Yeah
... in, uh, ASIC, ASIC project is like-
Yeah
... one to three, one to four.
One to three, one to four. Correct. Both in, say, team size and in duration.
Yeah.
Right. So if you multiply that-
Yeah.
Uh, yeah, square. And then I think-- So, so that's, that's, uh, I would say like, you know, it's a, it's a must cover. And now for software verification, it is, it is interesting, right? Because, you know, as probably we all realize, like my nephew web codes a lovable website. There is absolutely no need to formally verify that piece of code. Like, why would you? Now, I heard a, a story from Kmat, actually that New York Times reporter who, um, uh, told me the story, which is like-
Yeah
However, if you think about like, you know, in the time of agents, like my OpenClaw can probably do all sorts of things and probably can do some bad things. Uh, like my OpenClaw can decide to like text something bad to my professor, right? Like-
Yeah, yeah
And, and, and you can say that perhaps is that a problem of formal verification? Probably still not, right? You can change something about the action space and make it more limited, so you don't, you don't need to rely on formal verification. So you can have a lot of cases, but you can think about, you know, maybe an enterprise that is dealing with a lot of regulatory kind of stuff using agents. They might want to do something like it- it- it's their choice. But I will argue that the improvement of verification capability, both in latency, you know, and in accuracy, all this stuff, the performance holistically is going to determine whether people rely on formal verification or not.
Sure.
So in a way, we want to make it so good that basically we can make that a choice. And-
So, so why did the investors think that you could do this, right? 'Cause I mean, people have been working on verification for so long, and I think everyone agrees it's an important problem. And it-- and I think certainly if I can just have a verification proof for every program that I write, like, "Hey, Claude, like give me the proof also," and then it just produces it and, "Oh, yep, looks good to me," I would absolutely do that. But so why is it-- what was it that the investors saw, in your opinion, that persuaded them that, "Okay, this is the moment I'm gonna put in my two hundred million or whatever"?
I think, uh, when it comes to faith-
Yeah
... you either have it or you don't. So you either dream the dream with us or you don't.
Mm-hmm.
And that's okay, because when we realize the dream, the company's gonna be worth ten billion.
Yeah.
So I think that's kind of the, the feeling that I have, which is that we believe verification is the critical, critical part to superintelligence.
Mm-hmm.
Our version of superintelligence is absolutely verified. We don't think there's any other possible future. We do not believe that, I'm gonna say it on the record, we do not believe that an informal math system is going to be the math AGI solution.
Why not?
We just don't believe that.
I mean, w- the counterargument is, "Oh, you know, like we just do a lot of good RL," and, you know, we've seen, uh, GPT, you know, solving, you know, I think some murderous problem and like whatever. So why do you think that that runs out of gas?
Yeah. So you can say that if you are a frontier math, then you have like... So, so frontier lab, and you have like infinite resources, why does that-- there's this, it's by definition, no running out of gas, right? If you think like infinite means like there's no running out of gas. I don't think it's going to scale to superintelligence.
So you think that you run out, like you run out of money, basically, or run out of power from nuclear power plants or something?
So, so we as a startup, first of all, cannot do that. We first of all, as a startup, cannot do that.
Yeah.
But we generally think that formal math and by sort of converting math proofs to pr- programs, to code, give us much better performance.
So, so it's just your-- it's your sample efficiency argument and so forth that you just... and maybe you just can't, that, that you can't bend the curve enough if you don't use formal verification.
The thing is, the, the thing is the informal stuff is also available to us in a way. If you really-- like you can have a both informal and formal system, and that is going to be-
I see. I see
... very strong. I-- the thing that I kind of like, I think my, my suspicion about like, you know, whether we can scale to mass AGI just by the informal approach is you're going to keep having, you know, the LMS judges solution, or you have human experts who grade, and just human experts like doesn't scale that well. And then if you really argue infinite, infinity, then sure, then you also have infinite money and you can pay infinite... It-- There, there's so many... I- is there really infinite number of people who can understand and prove at, say like about like, you know, a result, a non-trivial result in Langlands program? I think, you know, good luck finding those p- people. And in fact, I think how frontier maths came to- came, came together is because they couldn't assemble a, a, a benchmark by... They, they are expert poor, so they have to, you know, collaborate with Epoch to do it, right? And, and I think that's kind of what, what I worry about, about kind of having the human part. So they have LMS judges, and then now stoch- stochastic judging. The problem is like whether something is impossible to achieve versus something is incredibly expensive and like really incredibly expensive and incredibly expensive to achieve get kind of like mixed in the end.
And then of course, investors always wanna know why you, right? So I've read a little bit about your background, and I think it-- we would do a disservice to the audience-
Of course
... if we didn't hear a little bit just about your personal story.
I see.
Do you wanna talk just a little bit-
Sure
... about like you've, you've done some really interesting stuff, so I'd, we'd-- I'd love to hear like you and then your team.
Yeah.
What, what, what makes Axiom special?
Yeah. I think Axiom is like very special because they are really expert mathematicians. Basically, they are users of the system we are developing, and that iteration loop is very fast. It is extremely, it is extremely fast. You have like some of the strongest, you know, mathematicians and both in research and Olympiad contest, and you also have people who are, um, you know, Mathlib contributors, maintainers, developers, um, lingurus, really, and, uh, combine them with people who come from like applied ML, um, you know, really strong organizations like Meta Fair, um, and Golden Age Fair, um, as well as people who have codegen expertise who work with like compilers like KernelGen, um, have kind of these backgrounds of people together. I think that sort of interdisciplinary way of thinking about things quite, quite helpful. We think AI for math has traditionally been quite interdisciplinary. People are borrowing techniques from even AI for science. People are tech-- um, bor- borrowing tech- techniques from, from the codegen literature, and people are borrowing techniques from obviously the broader like, you know, frontier like applied ML, um, to try to apply on the niche problem of AI for math. So we also think having this sort of very special, special team is a, is a differentiation. Uh, we also think that, you know, as you say, there's no, no permanent mode. Um, the proprietary data that we generate, um, and a little bit of a firewall we are seeing is a
time mode. Well, me personally, uh, I, I love math. I think, you know, I, I kind of have been doing math since I was very young and Like, maths sometimes gets really hard when, when the problem you are solving are, are just a little bit out of reach, and it gets a bit depressing. And times to times, I wonder if I can just have an AI help me. Uh, and, uh, and yeah, I think why-- I figure why not build such a thing?
Background55:27
Y- you did a master's at Oxford-
Yeah
... in neuroscience.
Yeah.
Has that informed your thinking here?
That's a great question. I think, like, my, my, my, you know, experience with neuroscience is y- you learn very well about what's hard-
And what's impossible. I mean, it's, it's very interesting. I think that year of neuroscience, like, gave me some feelings about what's hard and almost no feeling about what might work, so.
Okay.
But I think I was kind of under the pretense of neuroscience, like, hanging out at the UCL Gatsby Institute and was fortunate to do AI research with some really cool faculties. And so I think that was a very productive year of AI study-
So you were-
And non-neuro study.
You-- So you're-- It was mostly for you studying AI.
That's right. That's right. I think in the UK, if you, um, back, back in the, you know, um, 20th century, if you call something AI, you will not get the donation, but if you call something brai- brain science, you might have a chance.
I see.
So, so the UCL Gatsby, which is a premier AI hub where a lot of people actually go, um, you know, from there to DeepMind, including, uh, Demis himself, uh, it's a very wonderful research environment. Uh, I remember those kind of, like, tea time talks were very amazing and, and people were basically just doing AI. Uh, it's called the Gatsby Computational Neuroscience Institute. Yeah, I think how, how that kind of, you know, happened was because-- So I was, I was in the Master of Neuroscience program, and then, um, quickly realized that you need to, like, kill rats and, um, kind of don't want to do that, and computational neuroscience sounds more appealing and, and when you look at the project and you see, like, transformer, you're like, "Ah, you absolutely want to do that."
Yeah. Hmm. We're, we're all excited about that. So, so after the Gatsby, uh, you started a math PhD program at Stanford?
I started actually one year full, full-time at the law school-
Oh, okay
... because the JD PhD program's structured in a way where you have to spend full-- one full residency year. So that was also a very fun year, um, of learning things that, like, are just quite fascinating, like criminal law, looking at homicide cases. Exciting. No, um, and-
Do you ever feel like the legal system is under- or overspecified in some way that maybe you could, um, you could axiomatize-
That's a great question
... and improve?
That's a, that's a great question. I think for a lot of things-
Mm-hmm
... it's definitely underspecified. Um, for some other things, I was actually quite excited about sort of transfer learning from mathematical reasoning to those specific fields. I think appellate litigation, the legal gymnastics, you see some really good appellate scholars and lawyers that just come from maths training. Not many, but, like, Laurence Tribe for one, you know, Harvard law professor, um, one of the, you know, strongest, like, um, you know, s- appellate litigation and SCOTUS briefs, like, brains on, on the left, uh, Democratic Party. Um, and, uh, uh, and I think there's a lot of other domains such as antitrust that's incredibly flow charty. Contract law, sometimes also flow charty. Bankruptcy, tax, uh, more on the corporate side. I, I, I just love litigation stuff. Like, I, I mean, yeah.
So, so actually, I do just-
Yeah
... just because we're talking about litigation, it's not the same thing, but there was a, there was a Erdos problem that, that, uh, Axiom-- So I don't re- know if it was Axiom Prover or whatever. Is that right? There was a controversy about it because it had represented that it had solved the problem when in fact the proof had been-
Yeah
... it had discovered-
Yeah
... the proof and then just formalized it.
Yeah. So actually, what happened was-
Mm
... our competitor, Harmonic, decided to publicize that they have solved, uh, unsolved problems, um, Erdos number, uh, one two four and four eight one, and then we trusted their literature review, believing that these problems are really truly unsolved. And we were a really young company at the time. We wanted to test if our system can attempt to try the problems that our competitor can. We fully did not expect that it actually solved them, but, um, turns out that we're both wrong, that in fact, the problem has been solved before.
I see. So then-
It's not the only time that we relied on others' literature res- uh, res-re, uh, literature search, and, you know, we, we should own it. Um, the other time was this paper called "Dead Ends in Square-Free Walks." Um, you know, Professor, um, uh, uh, Professor Miller, um, have this problem that actually turns out to have been solved. But, um, we, we, I mean, we really should have done our part. That is, that is, you know.
The point I'm trying to maybe elicit is, um, not n- not like you guys did something wrong, but rather like how does-
You know, there's this, like, Japanese, like, um, advertisement of, like, a whole company, like hundreds and thousands of people, like, apologizing, uh- ... in the, in the advertisement, and it's like, you know, "Sorry we raised our price by, like, five cents." And that's the advertisement. And I was like thinking that maybe I should just do that. It's so-
Oh, no
... it's so embarrassing.
No, uh, no, but I think the, the question of provenance of information and sort of like how do you-- It goes back to the question I was asking before about, like, how do I-- how am I connecting the answer to the question?
Yeah. That's a great question. I think after the Erdos thing, we're, like, extremely careful, and so we kind of, like, you know, we didn't really look at the other Erdos problems. I believe that Harmonic still continue to claim they have solved Erdos problems that might, might, might not. I don't know. Uh, it's, you know, there's a, I think Terence Tao and a lot of other people have a database about all the Erdos problems and the status. I think, you know, like, it is really, by the way, like, it's a really easy mistake to make because there are so many Erdos problems that actually have been solved, right?
Erdos1:00:57
Yeah.
And I think that that's kind of... I- indeed, I think, like, you know, search and retrieval is a, is a, is a hard problem. Like, you don't know if that argument or an equivalent version of that. In fact, I think the most interesting part about that entire database is, um, there are a lot of problems that are not directly solve, solved, but can be just an very easy extension, almost a trivial extension of another result that has been solved or not, sometimes not even resolved. Sometimes I think in this, um, dead-end square-free walks case, which is nothing to do with Harmonic, completely Axiom's fault, um, that we actually didn't realize, and then Professor, um, Kannan Soundararajan actually pointed to us and to Professor Miller, is that it was actually from a Stack Mass Overflow or Stack Overflow- Post
Oh
Um, like a user pointed out that there is a nineteen thirty-six results. It's fascinating.
Yeah.
I think it's hard to, hard to find out. Now, why search is a hard problem?
I guess that means that you do-- Does, does the, the conjecture engine or whatever, does that, does that use search as part of its process, or is that something that you kinda, you, you, uh, the human does and then feeds-
I think, I think knowledge graph or knowledge base is a very, you know, important component of any-
Okay
... any company.
Yeah.
Uh, and I think, I don't think it's talked about enough.
And so, and, and you guys, with that, it sounds like you don't wanna give us too many details, but like, so you guys have a knowledge graph. I mean, I, that brings up also, I, I read somewhere that you guys have a really massive database of lean proofs that you've generated, so synthetic data in, in some sense.
Yeah.
But that the... And, and this may- maybe is a competitive advantage for you.
I think, I think everyone is trying to accumulate, like, a data, which is not a mode, it's just time and time mode.
Yeah.
It's, yeah, it's all, it's all, it's all about, like, you know, whether you can execute fast enough, um, to make sure that, like, you have, like, a certain buffer, um, because of, say, your data set, you know, accumulation, but that is only just a buffer.
Have you ever thought about doing something like an Alpha Zero for math, where you start from nothing and let it just make up axioms and see what happens?
Ah, this is a wonderful question. I think that's a very interesting approach, actually. Yeah, I think we, we believe in something which is that, like, you know, suppose, um, Axiom Prover can be a really strong mathematician, and-
Mm-hmm
... then really the, the thing that it is proving every day should hopefully help it improve, right?
Yes.
I think this sort of self-improvement design is extremely, uh, valuable.
Mm-hmm.
Um, and I think there are, um, other people in the AI for math community. I think, uh, Professor Gabriel Peyré's work is very interesting. Um, I think there are some of the, um, kind of more conjecturing type of exploration. Um, suppose we just kind of change, um, you know, a lot of the... Th-there are specific things you can do in, in certain, in certain ways that, that can try to see if your system can learn to conjecture and build theories.
I think that the, the topic is really interesting and important because it really... Y-you're claiming that the, t-to get to super intelligence-
Mm-hmm
... there's sort of this, like, it's just not gonna be possible. Maybe if you had infinite resources, you could just RL it, and it would work, maybe. But it, the reality is, is that you just can't be sample efficient enough or whatever it is to do that, so that you need some sort of verifier in the loop with the inference process rather than-- Because you do have verifiers in, like, sort of during the training process, and you just don't have them during the inference process. Is that-
Yeah, I think a lot of them are just secretly, like, trying to use this to ground their reasoning-
Yes
... as well.
I mean, I would. I was surprised that, that, like, when, when o1 was, you know, everyone knew o1 was coming, but it did- hadn't come out, I was sure they're gonna announce that they're using Lean to, to do, like, formal verification of proofs and ge- actually generate proofs and then verify them so that they're grounding and reasoning. I mean, that was my initial-
When Emil was there, there was GPTF. That was a great piece of work. There's also MiniF2F. These are all formal math work at OpenAI.
Okay. So presumably, those guys are doing something.
No, no, they all left.
Oh, they all left. I see.
So that's my point, which is that if you are, like, you know, an intern, I guess you can be an intern forever. So let's say you are like a junior, you know, like member of technical staff, and you want to work on something for, like, as long as it takes to solve it. Weirdly, people think about startup as this sort of your runway can just run out, and it can just, like, all fall apart thing. You might have a better chance of staying focused on the same problem-
Self-Improve1:06:02
Yeah
... for as long as it takes-
Mm-hmm
... at, say, a startup like Axiom or one of the other new labs.
Yeah. If you're aligned to the mission of the-
Than big tech
... company rather than, like, somebody decided that what you're doing is no longer-
Yeah, yeah
... of any interest to them.
It can be your VP lost some political fight and so, yeah.
Yeah. Absolutely. So-
Now, obviously, if we succeed, then they're all gonna, you know, start doing that again.
Yes.
And then, like, I guess as a talent, then there are more, like, you know, potential places to choose from as well.
Yeah. So th-then your job is to go fast so that they, they're, they're struggling. So actually, uh, um, you, um, we haven't talked about it, but you actually also just released an, um, an API for-
Yeah
... doing Lean verification.
Yeah.
And, um, I actually tried it with Cloud Code, um, because it's easier than setting up, uh, you know, your own Lean, um, tool chain.
Yeah.
Um, and, you know, like, tried to get Lean to prove some stuff.
Yeah.
Um, and The infrastructure is, is maybe non-trivial, um, especially at scale. So you wanna talk a little bit about-
Yeah, yeah. So we just released Axle, A-X-L-E, uh, stands for Axiom Lean Engine. And, uh, it's really a set of kind of proof validation and manipulation tools that are built for Lean in the language of Lean. So, uh, it's a bunch of meta programming tools. Now, meta programming talents are, are extremely, I think, like, you know, hard to find, and, uh, we're so grateful to have, like, a really crack team working on that. Uh, and we want to kind of like release it to the community for, uh, to use for free because we think that there are probably other people doing also, like, large scale Lean operations, and, uh, this tool is gonna make their stuff go a bu-- a lot more robust and faster and do so at scale. Um, and Axle is currently, I think, fourteen, uh, like, such tools, uh, starting from verify proof, which is the sort of to make sure that there's not-- nothing weird, um, you know, going on, like, no, no sort of cheating by, by Lean code. You don't axiom something out. You know, we don't, we don't assume weird things. If you axiom N plus N equals N, you can prove two, two-- prove two plus two equals two, which you're, you know, for sure not, that's not the right answer. Um, there are also like, you know, a lot of other kind of generation tools. Um, for example, you can try like different repair attempts.
So c-- you know, broken Lean in and then good Lean out. Uh, and you know, there are like currently, you know, other repair methods by LLM, so hopefully this, what we provide can be just a lot cheaper and, uh, more kind of, you know, straightforward. And it's just, you know, s-- I think strong, strong and better engineering can, can get you to, to a place that's quite far. A lot of the people from the Lean community has been using Axle, even if it's just been a week, to do all sorts of different interesting things. We have seen, uh, people from the kind of blockchain community use it to, to do interesting things, et cetera. And we have seen also, we have heard from a lot of the people that Claude plus Axle is kind of their go-to setup, um, for, for now. Um, we think that these are really interesting tools. I think famously, I think, uh, today there's this mathematician who said he formalized the Donald News, you know, using Claude to prove, I think, a result, um, a Ramsey, Ramsey result, and then to formalize the, the, the Lean proof, and then, um, that is also using Axle tool. So we are really glad to see people kind of already using it.
I mean, I feel like this is a great opportunity for the collaborations that Terence Tao was talking about as well, where once people have access to the common tools, then it becomes easy to do. And I mean-
Yeah
... like if, if you have a intuition, even, uh, y- not a strong ma-mathematician like myself, you might be able to participate in the, you know, sort of like an effort to prove a larger theorem or something like that.
Yeah, I think that's, that's very interesting, like, view, which is that like if you think about like mathematics has been not like as collaborative as software engineering. You don't have like hundreds and thousands of people working on something together. I think Polymath was an instance when, when that happened that was fantastic. So if you have a lot of really good sort of setup, indeed like commoditized kind of access, then people can all participate in. In fact, that's how I think some of the large formalization projects have been done. Uh, things are divided into sub-task. But really the blueprint writing process by, say, Terence Tao and Alex Kontorovich of assigning the task to different people and how things kind of fit together, that blueprint writing part is extremely important. And there has been, I think, a result about sphere packing, I think, by, um, one of the other companies out there. And the blueprint part for, um, the A dimension is still pretty much built on what the sphere packing, uh, community, the Lean community, the humans blueprint, and similar with some of their other results as well. The blueprint part has still been human-generated, and I think auto-generated blueprint is going to be a technical bottleneck that many people are trying to solve around the same time.
So is there value in me as a, you know, Cloud Code user trying t-to attempt part, like some small lemma or whatever, um, w-where I don't have a great understanding of the math? Maybe I have a high-level understanding.
Depends. Are you trying to formalize or are you trying to prove?
Uh-
To prove new things
... that's a good point. Yeah.
Yeah.
So maybe form-- You would obviously probably start with formalization, right?
Yeah, yeah.
You already know the proof and-
Yeah
... you just can't get-
Yeah
... nobody has been-
That's right
... able to get the formalization correct.
That's right.
Yeah.
I do actually have seen people use Lean in formalization, and they try to do it by hand, you know, not using any AI as a way to learn mathematics.
Mm.
No. It's, you know, it's all the formalization.
That sounds very hard.
You don't have that process. Well, it's interesting because I think a lot of the, uh, my friends who started, you know, working on Lean and Mathlib was because they are in PhD, and this problem is really hard, and we get stuck all the time, and we want to kind of review some of the undergrad classes, a time where we still understand what the math was about, and we do so by, you know, doing Lean. And I think that's, that's very beautiful.
So formalizing the material.
Yeah. But if you have, for example, like, you know, um, access to axiom prover that also can formalize all the formalized things, then you don't have-- you lose that part of the Lean learning process.
Yeah.
Yeah. But I do think that, you know, like for, for, for, you know, you and I, we can set up like Axle and try to like see, you know, what, what results we might be able to prove. And I think that's quite interesting. And thanks to Axle sort of making the speed a lot faster, you don't have to wait very long.
Yeah.
I remember the Putnam exam day. We were all like in the war room. It was a Saturday. We're all really excited, and we just got the exam paper from the official, um, like organizing the proctor of the Putnam exam. We just like were looking at like how, like how much workout Axle is like getting. And without it, we couldn't have solved it with, um, I think the eight problems was in the time limit. That definitely not, not within the time limit. And I think one thing about these tools is like it's very interesting in that- Potentially, you can have interesting reward for RL as well.
What do you mean by that?
Axle & Vision1:13:17
So for example, verify proof can be a, a reward for just basically a proof is completely correct and validated.
I see.
I think formal verification tooling can be interesting direction to pursue with RL.
Yeah. So you mean, um, for example-
Code grading and stuff
... you, so you formalize-- auto-formalize the informal proof and then verify what-- and then use that as a reward? Or do you mean-
No, as in like you pass like Lean programs-
Okay. Yeah
... in these formal tools.
Yeah.
Right? Like, and you will have some sort of score.
Okay. Yeah. I think if I were to build o1 or something, I would've, in my mind, I would've used what I just described. But you're saying just to learn how to do Lean itself.
So, so I-- the, the value proposition, which is interesting about Frontier Lab, is that suppose you are a two C business, then sure, you, you can just not do what we are doing. And we have seen, for example, DeepSeek, uh, like originally having a formal team and then later dissolve that team because of strategic direction change. That's all completely reasonable. Now, suppose you are focused on coding, right, and you have talent who want to work on what we are doing, it makes a lot more sense for you to do code generation, further your strength and moat.
Yeah.
You can partner with Axiom, um, just like how, for example, Frontier Labs, um, partner with, uh, startups that work on search, such as Ixra and Parallel, right?
Yeah.
Just call Ixra API for search and-
Yeah
... potentially, you know, if you're a Frontier Lab, I think you should call Axiom API for verification.
Yes.
Um, better composition versus-
Setting up your own infrastructure.
It doesn't make sense. I mean, it just, you know, potentially, um, I think the talent, the, the finickiness of Lean, um, the sort of data code, like, like, you know, there's, there's no reason to.
Yeah, I mean, it took me five minutes to set up. Why did you decide to start Axiom?
Right. Why did I decide to?
Like, you were a grad student at Stanford-
Yeah
... and, you know, in math.
Yeah.
So what made you decide to-
I wasn't in math for very long. I was-
Yeah
... I was, uh, m-- I think like almost as soon as I started the PhD-
Mm-hmm
... I just started fundraising. So it wasn't like-
Oh, really?
Yeah.
Okay.
Yeah.
Was that the plan? Or did you, did you start there, and you, like, almost immediately realized that this is-
Right, right
... the direction?
So, so the year of law school, right-
Mm-hmm
... it was very, very interesting to me, like-
Yeah
... on an intellectual level.
Mm-hmm.
But it's also the s- the first year where I had no science, technology or math-
Mm-hmm
... whatsoever in my life.
Mm-hmm.
It's a weird year, right? Like I'm, I'm reading a lot. I'm, I'm practicing... Well, I'm learning how to write, I'm learning how to read, like.
Mm-hmm.
And but, like, I'm, I'm just kind of, I want to like be obsessed about something in technology. Like that was-
Mm-hmm
... also what's going on that year. So yeah, the year of law school, right?
Yes.
And, and it was very, very interesting to me because it's like, okay, like I just, I need to be obsessed with like a technical thing, 'cause otherwise I get too... I don't think I'm bored, 'cause I really love like everything about, about law. I really, really loved it. It was, it was something that's incredibly interesting to study. Um, but I just, I, I mean, I've been basically like, you know, very excited about like the progress of reasoning. I was looking at a lot of the post-training kind of papers. I was, I was learning all of these, like just by myself. Um, and then at one point, it got to a point where I'm like, "I think this is for sure happening." And like, I think talking to Shubho, right, at, at Verve, like every weekend also, like it didn't help, like soothing this thought. So I got more and more obsessed. And at a point I'm like, "Okay, if I'm doing this like literally every minute and I can't think about something else, like what, what..." You know, I need to do something about it. I mean, it's like you, you... I, I fall madly in love with the idea that AI is gonna do math. And like, okay, now do I, do I do math? Like I... It, it's really, really crazy, like at a time where I remember the obsession was quite, I just couldn't get out of it. And then, um, I
went to this Nye Hennessy event, uh, this Nye Hennessy Scholar dining house. It's like, hosts all sorts of like free lunch events, and those are great because you get free food and you get interesting intellectual exposure to things. And I remember Julie Zhuo, who was I think a Facebook, um, first Facebook PM, came to speak. And then after that, I just like, basically walked up to her and I said like, "Uh, like what do you do if you want to do a startup and you really wanted to do academia because you, you kind of love math?" And then she's like, "Well, you know, what's your time spent on these two different things?" And I'm like, "A hundred percent, zero percent." And then she's like, "Well, you kind of have to follow your energy."
Yeah, I mean, if you're, if you are, um, completely obsessed with it.
Yeah, I was completely obsessed with it. I thought it's gonna be big.
Mm-hmm.
And I thought like, it, it, it just, it just has to be a for-profit startup because like, it, it's so much broader than making mathematical breakthroughs.
Hmm.
If you think about like recursive self-improvement and like really the, the kind of more high level like concept of like you really want to have just AI, AI scientists, like the math reasoning is gonna be, is gonna be a pretty big part of it.
Yeah.
And now trying, like I think the, the, the sort of belief by, by Cursor and Claude and other folks is like, okay, like just like math transfer to coding code, coding transfer to math as well. I think that's true. It's just that like, you know, why, why not push it directly? I don't-
Mm-hmm
... I don't get it. You need to push that directly. And then there's this other like, you know, thought which is that, and maybe kind of going back to the collaboration point, right? Um, verification has traditionally been thought of as, okay, well, there are some industry where there's a lot of guardrails, so if you're working in defense, military use, okay, you need to like basically satisfy a lot of barriers to entry to meet those stringent, like requirements. So it's, it's something that's... Verification is for the industries that are closed. But it's for the first time now I think verified AI is to open up collaboration, either as human-AI collaboration. Well, before blueprinting, that's human, human collaboration, and Lean was the grounding, was the verification, formal language. And then human-AI collaboration like we're seeing now, future AI agent, agent, agent like collaboration. So, like, I think verified AI is for openness. It's not for meeting the requirements of closed industries. And I think, just like I think verification should not be about, oh, I remember, like, you know, there was this article, like, chatbots make stuff up. Is AI the solution to-- Sorry, is math solution to hallucination? Uh, verification to me is not about lousiness. Verification to me is about scaling brilliance, compounding brilliance. It's like just kind of going back to the collaboration point. It's about Ramanujan being a much stronger mathematician. He was already a really strong one, but verification helps him extend
the brilliance, like both kind of like scale up and scale out. So verification-
It's more rigorous.
Verification to me is not about, you know, like erasing the mistakes and the lousiness, about scaling brilliance. And, and the third point is that, like, verification to me is, um, not about, like, the sort of, you know, just talking about rigor. It's actually about performance gain, right? It's not just about the stringent requirements, the hurdles that you need to overcome. It is about, like, actual verified generation's gonna make it so much better. And I think, like, kind of these three points, I think the last point is that a lot of the people think that you work on verification because of your distrust for technology. Like, it sells really well to, I think, the general pub-public, including, like, my parents. Like, "Oh, well, we're doing verification because, like, you know, technology make mistakes. It's..." No, we don't think verification is based on-- is because of the distrust for technology. It's because that's what, like, um, expected rapid exponential scale up and, and, um, the deployment and the creation of technology and technological progress is what that compels and demands.
It's a very mathematical perspective, right?
Yeah.
Because you're saying proofs are, proofs are drive math, right? A lot of math is based-
Yeah
...is, is about proofs.
Yeah.
And math drives a lot of science and innovation in the world, and the innovations in math drive innovation in the world.
Yeah.
And so that, that-
But it doesn't need to even go through, like in, in terms of, you know, the solve-- math solve everything thing like obviously stands. Like, my point is like transfer learning doesn't like... Transfer learning is about like pushing math reas-math reasoning. It just... So, so there are kind of, I guess, like there are a, a couple narratives here. Like for some people it's like you, you solve math, and then math are the, you know, fundamentals of sciences. So that's actually the, from AI for math, like take this radical layer of AI for science, it's that narrative. We actually believe in just like general tr-transfer learning.
Yeah.
Like, I think Axiom is, Axiom is on the infrastructure stack.
And you think that this is just a first step to, you know, basically unlocking capabilities in many domains, in science and law, for example.
Yes. I, I think it's... So, so again, there are like, you know, multiple, multiple kind of like beliefs. One belief is that there's math and there is like, you know, formal, the power of formal verification.
Mm-hmm.
Suppose we actually, you know, solve math and have a really strong informal math reasoning engine, we do not expect that term to be as large as solving math through the formal way.
Why?
I mean, code as, as it is language-
Yeah
...but it is indeed on the more structured end.
Yes.
It bridges informal and formal.
Yes.
What we are doing is it's not informal versus formal, right? We're not taking the sort of like completely formal of a proof approach. Like it's im- it's bridging be- between informal and formal. It is bridging between high level and low level. It is a direct, it's sort of like a, a direct improvement to reasoning, um, through transfer, like transfer learning. And it's also an indirect in that like, okay, well, like math is gonna unlock a little science and sure.
I see.
Um, and that is really what we're seeing.
So you think that it enables transfer learning?
Yeah.
I see.
I think that is, that is pretty much a consensus. I think it is a consensus, and this is a bet that has been pretty much kind of overlooked by others because math sounds pure, and it doesn't sound like there's any commercial value.
Mm.
Well, I do ob-obviously understand the opportun- like the opportunity cost if you're like a really like a frontier lab of, of solving this problem. But I definitely think this is a problem that if you're like a well-resourced startup, you should be doing.
That's an interesting-
Yeah
...perspective. Did you get everything out that you wanted to say?
Yeah, I think, I think it's, um, like, you know-
Yeah
...like the question of like, is Axiom Math or is Axiom Ver-Verification? The DNA of the company is math. We think that's, uh, verification's the best first market.
Yeah.
And we think that sort of like solving math, and especially like formal math, um, i-is going to like help us like tackle the really, uh, ambitious quest of verified AI.
Yeah, yeah.
Now, when we are done with that, we might have other, the second markets, including AI for science that we just talked about. But, but on the theoretical layer, right? Like I think real world testing is important, and potentially we can stay in the digital world and, and soft-software stuff and for other things to be, to be, to be getting real world, like physical world signals.
But do you think that, that, that the, the sort of, the capability of doing really powerful reasoning-
Mm
...once you have that powerful verified reasoning engine-
Yeah
...that that's the moment when, okay, now we, we've unlocked that for, you know, software verification and hardware or whatever.
Yeah.
But now, okay, so now what about biology? What about chemistry?
So that could be one.
Yeah.
The other one is then, like, really how far are you to recursive self-improvement?
Okay, so just AGI.
Yeah. I think there is this sort of question, and different people, because of their probably different backgrounds, have different, um... It's, it's really where your energy and your passion leads you. Like, for some people, actually, I have heard this actually, you know, with my friends, they want to work on AGI because they believe solve AGI, solve death. There are other people who come from a more, like, medicine background, they really believe they can solve death, and they don't solve AGI and then solve death.
Yeah.
They just solve, like, AI for science.
Yeah.
Now, which way is correct? I don't know.
And so the recursive self-improvement angle, it sounds to me like you're saying that the combination of verification plus the sort of like language, which is informal, it's that combination that enables really good recursive self-improvement.
I think recursive self-improvement is going to happen anyways. We're trying to have, like, formal verification earn its place. So we, we, like, a-again, the, whether, um, formal verification can be welcomed and deployed and become a consensus depends on how well we execute. And I think when you boil down that problem into an execution problem, you should just go for it.
What-- Looking forward, what's the biggest bottleneck that you see in the field for both Axiom and maybe just the field abroad in terms of-
Uh, fragmentation. So, um, I think we're in a market where people like to start like, you know, like, a thousand people, they don't join force, they start a thousand things. I think that's actually the biggest, like, kind of bubble indicator. I think there are categorical bubble, and there are, like, other categories where there are moonshots. It's not bubble, it just looks a little bubbly. In the field, if people who are, like, really, like, of really legit, you know, backgrounds decide to join force and work in a team for the mission rather than for ego, for kind of the status neo founder, I think that category is-- I'm, I'm really bullish other... And vice versa. So I think the bottleneck actually is about potentially... I think, I think it's, it's, it's annoying because it's like we are in a-- If, if you believe we are in a, in an age of, um, research, if you believe in, like, deep tasks are the interesting directions we go after, um, the market sort of conditions currently is good and bad, and that good, it enables these sort of long-term, long-horizon bets to be funded. Bad, because there's too much noise in the market and some other, like, irrational players. Um, you know, we, we try to work with really incredible venture firms. Like, they are the partners, they are our intellectual partners, and there's a lot of alignment, and we really bounce, like, very, uh, cool ideas, technical
and t-tech- non-technical each other, like, for long, long hours. And, and, you know, we spend, like, a lot of time off work and weekend together to really intensely build the company. But there are also other people who just want to, like, park, like, capital somewhere. And, you know, while we don't work with them, these encourage, these are market conditions that encourage fragmentation. And when things get fragmented, like, no one gets there. Like, I think every category, regardless of how, how right the idea is, is pretty much in a sort of earning the right to exist stage. And if that is the case, then, um, for example, great deep tech company, SpaceX, and people do actually join force to work on that dream, and potentially, in that case, also a very charismatic founder. I think a really kind of concerning thing for me personally is that for other probably some categories that I'm personally quite bullish about the direction about and just, like, looking at things generally, fragmentation is a problem. Like, the sort of, you know, we see start pooling professors from university to, um, work on something when i-it-it really, it really is a really interesting kind of situation.
Maybe this is a naive, naive question, but, like, right now, when you were talking about players in, let's say, AI for math-
Yeah
... where, you know, you, Harmonic-
Yeah
... and then, you know, the big labs, right?
Yeah.
Am I missing someone? Is, like, is that actually fragmented really?
I guess fragmentation, I think, is a bottleneck for the entire, like, AI landscape.
Okay. Yeah.
I think AI for math is a category that is actually not a bubble-
Mm-hmm
... because it is not fragmented, because people who are really amazing talents do like to join force.
Mm-hmm.
So, for example, the fact to get Ken Ono and François Charton on one team-
Mm-hmm
... like, this is fantastic. Like, you have someone who's a core contributor of frontier math tier four-
Mm-hmm
... really great benchmark-setter, François, who's on the AI for math discovery, proving and discovery. They work together. Then you are suddenly a player with both proving capability and construction capability, and that's fantastic. And I believe, you know, as you said, like, Harmonic probably also have some really great talents, like joining force together. I think AI for math is a, is a good category because of the absence of fragmentation. But even, you know, from, from our perspective-
Mm-hmm
... the sort of, for example, um, you know, RL, right, b-being-- I don't, I don't think that's, like, a category per se, but, you know, RL talents currently, um, it's quite hard to, to attract and retain, right, for, for literally everyone. And there are a lot of, um, companies being started, and then sold like three months later, and, and just... The, the-- Each month where you could have worked on, like, a technical problem, and you're instead working on deals, it's a month that is wasted. And I say that, like, you know, also with some amount of pain and suffering-
Yeah. Really
... having, having gone through two fundraisers. Yes, yes, yes.
Yeah.
Yeah.
So, so what's the biggest bottleneck-
For me
... in AI for math?
For, for Axiom, for AI for math.
Not Axiom, but just the community as a whole.
The community of AI for math.
Yeah. Where is it going? What is the thing that everyone just really wants to break?
I expect fragmentation to start to happen-
Mm-hmm
... as Axiom and Harmonic establish category leadership.
Mm-hmm.
So I expect people kind of, you know, that, that's one thing. But I also think that another bottleneck could be the pressure of short term versus long term. I think that we are doing things in a very sort of fast-paced manner, but that does not mean we can always, or it does not mean it is always correct to do things in the most fast-paced manner.
Mm-hmm.
Like, we did things in a fast-paced manner because, well, we were founded on the day of the International Math Olympiad, so we couldn't have competed in that anyway. The next Math Olympiad is Putnam, and we're quite excited because it's, I mean, it's undergraduate exam. And this year's IMO, twenty twenty-five IMO, was easy on the MOHS scale, and Putnam could be hard. And in fact, it was harder than the IMO and the MOHS scale, if you look at AI, um, you know, how much, how many scores the AI, uh, has, has retained on average and on the max, you know, difficulty of the problem. Putnam is harder in both, both axes. Um, so we want, want to try, and so there's only a gap of four months. But it doesn't, it doesn't mean I'm always gonna set four-month goals. If I build a company only setting four-month goals, I, I might build a really short-sighted company. So there are, like, I think, longer horizon problem. I think, for example, market forces could force other players into chip verification. Well, it is possible that co-verification is a holy grail. It's possible that if you solve that, then you also naturally solve chip verification with some amount of, like, epsilon caveat of, like, distribution shift. But I strongly believe that, like, a bottleneck, like, could be the pressure. But I think that Axiom is fortunate that, one, we're early enough, two, we are, like, a team of just
incredibly, like, um, high-agency people, that our execution generally surpasses expectation. But I think, like, what I think could be a bottleneck for the entire AI for maths field is that potentially trying to prove commercial value is going to distract significantly from the core capability improvement.
Yeah. That makes sense.
Okay.
Cool.
Thank you for driving up and coming to see us.
Thank you so much. Yeah.
I know the traffic was n- was horrible.
Yeah. Thank you.
And, and, um, it's been really a pleasure speaking with you, and-
Yeah
... um, we look forward to-
It's awesome
... to seeing how things develop.
Yeah. Thank you so much.
Thank you.
Awesome.
Thank you.
Thank you. Yeah.
Yeah.