Scaling Past Informal AI - Carina Hong, Axiom Math — Latent Space

Intro0:00

Carina Hong0:00

But it's for the first time now, I think verified AI is to open up collaboration, either it's human-AI collaboration. Well, before Blueprint, like that's human-human collaboration, and Lean was the grounding, was the verification, formal language. And then human-AI collaboration like we're seeing now, future AI agent, agent-agent like collaboration. Like, I think verified AI is for openness. It's not for meeting the requirements of closed industries. And I think just like I think verification should not be about, "Oh, I remember," like, you know, there was this article like chatbots mixed up of as math solution hallucination. Uh, verification to me is not about lousiness. Verification to me is about scaling brilliance, compounding brilliance. It's like just kind of going back to the collaboration point. It's about Ramanujan being a much stronger mathematician. He was already a really strong one, but verification helps him extend the brilliance, like both kind of like scale up and scale out.

Brandon Anderson0:50

Welcome to the Latent Space AI for Science podcast. I'm Brandon Anderson. I build, uh, RNA therapeutics at Atomic AI, and I'm joined by RJ Haneke, uh, the CTO of Mira Omics, uh, working on spatial transcriptomics. It's a pleasure to have Carina Hong, fr- CEO and founder of Axiom Math. Um, Axiom has made a splash in several different areas. First, they were-- they got a perfect score on the Putnam, uh, last, uh, December, I think. They also had the claim of the first AI to prove research conjectures using formal verification. And, um, very exciting, they just yesterday announced a, a quite large, uh, Series A. Um, yeah, welcome to the show.

Series A0:52

Carina Hong1:27

Thank you for having me.

Brandon Anderson1:29

You just raised two hundred million dollars, which, as one of your colleagues said, this is like basically the entire, like, US math budget for math research each year.

Carina Hong1:38

Is that true, actually?

Brandon Anderson1:39

Uh, according to his LinkedIn post, yeah.

Carina Hong1:41

Okay, wow.

Brandon Anderson1:42

Uh, two hundred and fifty million is our apparently annual, uh, math budget.

Carina Hong1:46

I think we should spend more on math research.

Brandon Anderson1:48

Yes.

RJ Haneke1:48

Yeah.

Brandon Anderson1:49

It's, it's kinda sad, like-

Carina Hong1:50

Yeah, I know.

Brandon Anderson1:51

But anyway, like, you know, as a, you know, as a nerd who loves math, that's like really cool. But I mean, I'm just like, that kind of blew my mind, like, what? When I read that, like, okay, so like, yeah, how is it two hundred, two hundred million at, I guess, one point six billion valuation? Yeah, I don't know.

Carina Hong2:06

Yeah, um, well, super, super excited to be here. Um, also, I think like, you know, uh, this is a Series A, so it's very, very interesting, timely, timely podcast. Uh, we're like a s-seven, eight-month-old company-

Brandon Anderson2:17

Mm-hmm

Carina Hong2:18

...so it definitely means a lot to us. It's a really cool milestone. Um, we're currently about like thirty people now, right?

Brandon Anderson2:24

Mm-hmm.

Carina Hong2:24

So kind of going into, I think this amount of funding will, like, give us the fuel that we-- it needs to, to, to accelerate-

Brandon Anderson2:31

Yeah. Mm-hmm

Carina Hong2:31

...um, the strong execution momentum that we have so far. Um, I think like people think of us, like there are many kind of ways to think about Axiom. People think of us, us as a math startup, so-

Brandon Anderson2:41

Mm-hmm

Carina Hong2:41

...math startup, lean startup. The other obviously things that we do, um, that are formal verification, we think verification is a really good best first market for math.

Brandon Anderson2:50

Mm-hmm.

Carina Hong2:50

And so I think this fundraise is gonna like let us explore some of the applied domains, uh, as s-my, my colleague, CTO Subho said in the, in the little launch video, um, with the Series A we had, is it h- it lets us broaden our dreams. So-

Brandon Anderson3:03

Mm-hmm

Carina Hong3:04

...yeah.

Brandon Anderson3:04

But still, like two hundred million dollars and I guess a one point six billion valuation, how is there a market for that? I mean-

Carina Hong3:12

Yeah

Brandon Anderson3:12

...I mean, like, obviously you're not doing this just for the fun of proving things-

Carina Hong3:16

Yeah

Brandon Anderson3:16

...although I'm sure there's a lot of that, but.

Carina Hong3:18

So let's, let's bring us back to twenty twenty-four. So when, you know, o1 reasoning models like just came out. What is-- what was Anthropic kind of like secretly working on back then? It was coding.

Brandon Anderson3:29

Mm-hmm.

Carina Hong3:29

And everyone knows they are working on coding.

Brandon Anderson3:31

Mm-hmm.

Carina Hong3:31

Like OpenAI, Meta, Axiom, everyone has full knowledge that Anthropic was working on coding. And they just like overlooked it. They thought, "Oh, they are a B2B play, so they just want one vertical." People think of coding as one vertical, and now look at where we are today. Coding, kind of like strong transfer learning from coding to reasoning to basically, you know, a monopoly in the, in the future of reasoning, and I think that's, that's really, really shocking. The people who were working on coding, I think back then believed in something that we believe, you know, similarly with math and Lean now, which is that if you have more structured and formal data, it's going to be a lot more horizontal than the specific vertical we are tackling. So you know, if today we are doing, you know, math informal way, like the standard chain of thought data, train a math model based on human preference, then I would say, well, perhaps we are just a math startup, right?

Brandon Anderson4:26

Mm-hmm.

Carina Hong4:26

But you know, while we are pursuing math, we are also doing things that do have transfer learning to other, to other domains. Um, so I think that's kind of like the broader, broader picture, is that while the DNA of the company remains math and all of us are math nerds, and this is a very strong cultural statement, everyone has a great mission of having AI be a superhuman mathematician like we are seeing on Putnam on a batch of research conjectures. In fact, we have another batch coming. Um, we are also thinking that this is going to be fundamental to verified reasoning, and we kind of talk a little bit about verified AI. I want to talk a little bit about verified AI next, 'cause I think you have another-

Verified AI4:52

RJ Haneke5:06

Yeah. Yeah.

Brandon Anderson5:07

Yeah.

RJ Haneke5:07

I have, um, several things I want, like, uh, so I wanna hear about the verified AI. I do wanna dig in a little bit. So do we know that, you know, Anthropic and OpenAI and everyone there not doing formal verification and using that for their rollouts and whatever?

Carina Hong5:23

I think I have a lot of like rumor mill that I probably shouldn't like put it on the record. Um, like- ...I think, you know, like researchers talk, they play card games.

RJ Haneke5:31

Yes. Yes.

Carina Hong5:31

Yeah. But there, there are really interesting reasons if they are or are not doing it. I think that's like kind of the takeaway I have, which is that if you're like at a frontier lab-

RJ Haneke5:41

Yeah

Carina Hong5:41

...and the direction actually does change a lot for a lot of the reasons beyond your control So I want to kind of like bring us back to the AlphaProof moment, right? Like AlphaProof was such an amazing... That really, the 2024, um, 28 out of 42 performance was the IMO moment for me. It was not Gode in 2025, because across 2024 and 2025, AI models could solve all the problems that are not combinatorics. The only difference is that, you know, if you get all the problems that are not combinatorics, you get 28 in 2024 and 35 in 2025, because there's only one combinatorics question in 2025. Um, after AlphaProof, kind of like we didn't see a lot of the formal math, uh, you know, results or kind of progress from Google DeepMind, and that's actually because of reasons that are not necessarily technical. But if you're at a startup and you have very singular focus that is formal math and verified AI, then, um, you know, you get to work on a really cool problem for a long time, and you have like a lot, a lot higher likelihood to get to where you want to be in terms of like progress and breakthrough unlock.

RJ Haneke6:48

So yeah, just define that for us.

Carina Hong6:50

Yeah. Like a lot of people think about formal verification as an ancient, you know, subject. Um-

RJ Haneke6:56

Yes.

Carina Hong6:56

... it, it, it existed like as long as, you know, way before like, like deep learning, and it existed in the time of rule-based computer science. Uh, there's this really strong push of like formal verification around like since, ever since 1980s. Uh, really interesting historic anecdotes such as, uh, I think the Paris Trade Union demanded that the automatic switching of the subway system needs to be formally verified for safety purpose.

RJ Haneke7:20

Mm-hmm. Yeah.

Carina Hong7:21

So quite interesting trade union for, for technology.

RJ Haneke7:23

Yeah.

Carina Hong7:24

Um, and like I think around the time of Challenger, both before and after, European Space Agency was using formal verification for the Ariane spacecraft. Uh, it's also interesting. Boeing, Airbus, formal verification. And then more recent years, right, I, I think like there's a lot of push about automated reasoning at AWS because they have a lot of enterprise customers that really require things to be, to be 100%, you know, verified and there's no edge cases mi- uh, like missed and like just general like testing doesn't satisfy the need. So a lot of people think about verification as something that's like annoying because it's like tax and compliance. Like it, it's-

RJ Haneke8:01

Yeah.

Carina Hong8:01

... making sure that we are good to go, right? And like that's really not the... And, and so we, we talked about like verification. I think our competitor when they launched, they talked about, um, formal verification pre-reasoning. They talked about it, uh, in the time of hallucination. And, and maybe for them, like formal verification is about the lousiness, the, the hallucination. For us, no. Like for us, verified AI is about the brilliance. It's about scaling and compounding super intelligence. So this is quite a deep point, and sometimes it takes a little bit of explanation. So if you think about like, you know, the, the place of brilliance, for example, Ramanujan, like he's a brilliant mathematician. He was able to find a lot of like interesting formulas just by intuition before he know how to do proofs. So he went to Cambridge, um, you know, work with Hardy and Littlewood and, you know, in the famous movie, The Man Who Knew Infinity, there's this like storyline of how hard it was for Hardy to force him to no longer rely on intuitions and do proofs. After he learned proof writing, he came out as a much more powerful mathematician whose results, um, like intuitions, turn into theorems, and future generations of mathematicians build on that, those theorems. So it is a way to kind of scale and compound, uh, the intelligence that we already have. Another example, mathematicians kind of have been writing code in English

or their respective country's natural language for thousands of years. And why do I call it writing code? Um, because there's this sort of community standard of rigorous logical deduction. Everything has to be step-by-step correct, um, otherwise you will get outcasted by your math community. Uh, like look at some-

RJ Haneke9:46

So it's the law.

Carina Hong9:47

Uh, well...

RJ Haneke9:48

Or rules in, in the community.

Carina Hong9:50

So, so it's interesting, right? Because that is kind of human mathematician enforced.

RJ Haneke9:54

Mm-hmm.

Carina Hong9:54

Right? And so, uh, it's a peer review process. Peer review ta- of a paper currently takes two years. Okay, so but proof assistant and, you know, formal proof trackers like Lean still found its place, right? And why? Like if you, if I'm a mathematician and, you know, any, like my work can be peer reviewed by other humans, like why do we even... why do mathematicians even play with Lean, right? And why do we even like talk about kind of like, you know, Lean-based, um, assisted like theorem proving? It's, it's because like it handles a low level. For example, we're not even talking about AI. We're talking about, for example, the grind tactic in Lean. It can currently handle a lot of math proofs, like at a very low level. And, and this is pretty shocking because I have seen, you know, actually another, uh, a company working in the same space, like, you know, some of their demo, and I look at the demo, like it can actually completely be handled by grind, uh, which is a tactic in Lean.

RJ Haneke10:50

Can you explain what Lean is to non-experts?

Carina Hong10:52

Oh, okay.

RJ Haneke10:53

Yeah.

Carina Hong10:53

Yeah, yeah. I think our order is like a little, little wrong. Yeah.

RJ Haneke10:55

Yeah.

Carina Hong10:55

So Lean is a computer program-

RJ Haneke10:58

Uh-huh

Carina Hong10:58

... uh, a bit like for math proofs.

RJ Haneke11:00

Yeah.

Carina Hong11:00

It is a formal language-

RJ Haneke11:01

Mm-hmm

Carina Hong11:01

... uh, just like its cousin Isabelle, Coq, uh, or Roc-

RJ Haneke11:06

Mm-hmm

Carina Hong11:06

... um, and, um, uh, some other further co-cousins like Daphne, Agda, like these formal languages. Whole sector. Yeah.

RJ Haneke11:13

And what, what does it do?

Carina Hong11:15

It, it basically, if you have a proof written in the program in Lean, and then, uh, assuming there's not, no, any like weird things happening, so like, you know, unintended, like, you know, use of sorry, which is a tactic that let you take things for granted, assuming everything is, is safe, um, hence pe- people have tools like comparators, safe verify, and Axiom recently rolled out verified proof that's like a hundred times faster than comparator. Um, then, you know, once you kind of execute that program, it com-- like once it compiles and it tells you that it's correct, then it is, the proof is actually correct.

RJ Haneke11:47

So it's like a type checker?

Carina Hong11:48

Yeah. That is based on this result called, um, Curry Howard correspondence, which turns proofs into programs. So, so I want to talk about the magic of Lean, why I think it's a really good programming language. It's because on one hand, if you don't care about the formal part at all, if you don't care about the logic part, you just want to use Lean to write code, you can. Like, we have had candidates, actually currently, um, you know, the person is working at the Lean FRO, um, he wrote Autograd in Lean in our interview process.

RJ Haneke12:18

So, so it's a, it's a Turing-complete language?

Carina Hong12:20

That's right. So, um, you can write-- you can do a lot of things with Lean. You can... It's a, it's a functional programming language.

RJ Haneke12:25

Okay.

Carina Hong12:26

Right? And then you can, you can also use it to... So you use it to do coding, you can use it to do math, two-in-one.

RJ Haneke12:32

Okay.

Carina Hong12:32

And kind of going back to what I was kind of getting at, if mathematicians are already enforcing that most proofs, you know, say, say maybe not all mathematicians, but, but the ivory tower people in academia, all proofs are correct, why do we even need Lean, the model checker? It's because Lean has tactics that help them handle the low-level calculation or proof or deduction, not calculation, um, then for them to be able to navigate in a high-level intuition space. So this is my point, that it is not about, like, formal verification or verified AI. To us, it's not just about handling or, like, kicking out the lousiness, the hallucinations, the mistakes. It's about scaling brilliance. It's about super intelligence.

RJ Haneke13:17

I actually... Terence Tao has a great video also about using Lean to, as a way you can collaborate-

Carina Hong13:23

That's right

RJ Haneke13:23

... because you can use the sorries.

Carina Hong13:24

Exactly. Exactly. That's another point I want to-

RJ Haneke13:25

Yeah

Carina Hong13:25

... I want to talk about, right?

RJ Haneke13:27

Yeah.

Carina Hong13:27

A lot of people think about, you know, what is our, what is our market? It has to be some, like, really niche industrial societies area that is mission-critical, safety-critical. No, that's not the TAM. The TAM is all code.

RJ Haneke13:42

Mm.

Carina Hong13:42

The TAM is a f- is a right of first refusal on all AI-generated code. Like, for right of first refusal, meaning, you know, you get to choose whether you want to verify it. So this is the important part I want to kind of come across, which is that people talk about formal verification as almost, like, painful because it has all these, like, stringent requirements, uh, you-

Axiom System13:42

RJ Haneke14:03

Up until now, it has been very formal, right ?

Carina Hong14:05

Well, yes. Yes. And, and to us, it's actually verify generation means performance gain. It means higher sample efficiency. It means a startup like us with like, you know, still we, we raise some money, but lesser compute budget, lesser data budget than Frontier Lab will be able to match and even exceed, you know, performance on superhuman tasks. In fact, for the Punam exam that we just competed December twenty twenty-five-

RJ Haneke14:29

Mm-hmm

Carina Hong14:29

... which we did in real time, Math Arena, which is this organization that evaluates a lot of LLMs, found the best LLM, DeepSeek, got one hundred and three point out of a one hundred and twenty-point exam. The best human, obviously we now know, is a student from either MIT or Chicago, we don't know which one, um, because they don't announce the top five winners' score, got one- one ten, and we got one twenty. So it's the first time, actually, I remember when we were starting this, people were like, "Is it even possible that a formal math, you know, system with so much orders of magnitude less data can, can match or beat a, a for- an informal LLM?" And Punam is the first time it beat, right? And, and so we're not thinking about it just about the painfulness, the challenges it pose. We are thinking about the verified generation performance gain, the improvement, the, the fact that you can, you know, just like you would expect RL for Lean to, to have improvement because of seeing evidence of RL in coding. So this is the second point I want to make about, like, how to think about verification, verified AI.

RJ Haneke15:30

So maybe we can talk a little bit about why. Can you describe what, what is different about what you do versus what the Frontier Labs, y- you know, at least when they're building-

Carina Hong15:40

Right

RJ Haneke15:40

... their standard-

Carina Hong15:41

Right

RJ Haneke15:42

... RL-

Carina Hong15:42

Right

RJ Haneke15:42

... enhanced, uh, LLMs, what, what's different about what you do?

Carina Hong15:46

Yeah. So we heavily rely on a kind of data called Lean data, and we kind of talked about Lean as all the, all the data, um, that we have that Lean proves, you know, it, it's correct. So you, you know it's correct or not, and that's quite, quite important.

RJ Haneke16:00

Mm.

Carina Hong16:00

So, you know, we have a system of models. These models are post-trained, and, um, using RL or FFT-

RJ Haneke16:08

So LLMs found, like some sort of foundation model that you get off the shelf, and you post-train it, or, or con- continuous-

Carina Hong16:17

Yeah

RJ Haneke16:17

... pre-training on Lean?

Carina Hong16:17

And there's obviously an inclination for open source, you know-

RJ Haneke16:20

Yeah

Carina Hong16:20

... based models like-

RJ Haneke16:21

So it does speak English-

Carina Hong16:23

Yeah

RJ Haneke16:23

... probably knows how to code-

Carina Hong16:24

Yeah

RJ Haneke16:24

... but it also, you fine-tune it or, or continue-

Carina Hong16:28

Yeah

RJ Haneke16:28

... pre- pre-training.

Carina Hong16:28

Yeah. And the base model might be similar to what everyone else-

RJ Haneke16:31

Yeah

Carina Hong16:31

... is using as well.

RJ Haneke16:32

Right.

Carina Hong16:32

Right. Um, if they're not kind of pre-training their model.

RJ Haneke16:34

Right.

Carina Hong16:34

Yeah.

RJ Haneke16:35

Mm-hmm.

Carina Hong16:35

And then we basically do this, you know, RL for, for formal math kind of. There's, I think, a standard pipeline or, like, you know, tricks of the trade that-

RJ Haneke16:44

Carina Hong16:45

... people use. We try to innovate really, uh, on top of it as much as we can. I think that we found, um, scaling inference to have almost no wall, um, recursively decomposing, uh, you know, a proof goal into many subgoals, and then learning to backtrack as well.

Brandon Anderson17:02

Is there a risk that, like, you start out with this, you know, what you know in a certain domain of data sets and so on, and then you start rolling out, you know, recursively in a space, but now all of your training data is localized in some domain that you... It still is only so, like, maybe logarithmically, um, in some large space growing from your initial training data. So you could get trapped essentially in that, you know, you could be really good at this, but you just created a big jagged frontier where some other domains are just far from them.

Carina Hong17:34

Distribution shift-

Brandon Anderson17:35

Yeah

Carina Hong17:35

... are we talking about?

Brandon Anderson17:36

Yeah.

Carina Hong17:36

So yeah, so, so, you know, it is an open question-

Brandon Anderson17:38

Mm-hmm

Carina Hong17:39

... whether a, uh, a system that can do really well in number theory can do well in, uh, give me, you know, another, another field-

Brandon Anderson17:46

Topology

Carina Hong17:46

... of math. Yeah, exactly. Well, actually, I think this, the way we think about it is it, it depends. It depends on whether topology has a lot of the, um, existing definitions as almost like, you know, the, the math infrastructure-

Brandon Anderson18:00

Mm-hmm

Carina Hong18:00

... um, existing. Because what people have found in the past is when people were building out math lib, like, you know, for the algebra, um, you know- Book work, um, like they, they can just-

RJ Haneke18:12

So Mathlib being the Lean-like undergraduate library-

Carina Hong18:15

That's right. That's right

RJ Haneke18:15

... kind of thing.

Carina Hong18:16

Yeah.

RJ Haneke18:16

So it's like all the proofs that you learn-

Carina Hong18:18

Yeah

RJ Haneke18:18

... in undergraduate math-

Carina Hong18:19

Yeah

RJ Haneke18:20

... and they're all sort of in Lean.

Carina Hong18:22

Yeah, yeah. So for example, some of my friends, um, who currently are at Axiom, it's, you know, crazy, like full circle back moment. Um, Kenny, uh, we're like friends for like, you know, five, six years, and he was the first one to tell me about Lean. Uh, he was working with Kevin Buzzard to build out Mathlib. It's a lot easier to codify algebra in Mathlib than, than for, for analysis.

RJ Haneke18:42

Mm.

Carina Hong18:43

Uh, so, so that's, that's interesting because for analysis, a lot of the definitions around convergence limits, et cetera, becomes tricky. And so I don't think there's a lot of like topology in Mathlib today, um, in terms of like differential topology, differential geometry kind of stuff. So, you know, our system likely will not do very well on those, on those domains because it doesn't even have definitions to build off on top of. For the places where the, um, definitions are, are in, we actually are doing quite okay in terms of distribution diver- diversity. We have good performance, you know, sol- having solved open research questions in, um, number theory, commutative algebra, algebraic geometry, some discrete math like combinatorics and probability.

RJ Haneke19:25

So earlier you said that like with the Putnam exam, the, the 2024 version, when all of the questions were... that were not, that AlphaProof did not get right-

Carina Hong19:35

The IMO, the International Mathematical-

RJ Haneke19:36

Sorry, IM- yes. For the IMO, all of the ones that got wrong were in combinatorics. Is there, is there like a weakness there in that specific domain?

Carina Hong19:45

I would say so. For, for Olympia in math-

RJ Haneke19:47

Mm-hmm

Carina Hong19:47

... people are seeing combinatorics being a little bit more, um, tricky.

RJ Haneke19:51

Mm-hmm.

Carina Hong19:51

Uh, seems like the steps are quite creative. So I, I, for a... I'm a human and, you know-

RJ Haneke19:58

Carina Hong19:58

... when I have friends who are really good at combinatorics, which I never consider myself really the, the top of combinatorics. I'm-

RJ Haneke20:05

Mm-hmm

Carina Hong20:05

... kind of better in number theory, but I know some people who are just, they're IMO gold, perfect score, Putnam Fellow-

RJ Haneke20:12

Mm-hmm

Carina Hong20:12

... perfect score, and like all the way. And then when they do like tricks in combinatorics, I'm like-

RJ Haneke20:18

Mm-hmm

Carina Hong20:18

... "I don't know how you thought of that," and but, you know, after you give me that construction, it actually-

RJ Haneke20:23

Mm-hmm

Carina Hong20:23

... becomes a lot more trackable. I think a Lean-based system will struggle in those very creative, um, places, which is why we at Axiom actually also invest on something called mathematical discovery. It does not use Lean, and we have some major news in the coming weeks, um, basically open sourcing entire code bases of mathematical discovery coming up.

RJ Haneke20:46

You wanna tell us a little bit?

Carina Hong20:47

Yeah, yeah, sure.

RJ Haneke20:48

Yeah.

Carina Hong20:49

Um, so, uh, we are currently, uh, having two code bases, um, being open source. The, the goal is for if you're a mathematician or you're a theoretical physicist, and you have a problem that you would like to solve, for example, you want to, uh, find a construction that is a very complicated graph construction, then we would suggest you follow the very detailed manual supposed intended for mathematicians to run the code that we write. Uh, it, it's a, it's a tool for, for mathematicians to make mathematical discoveries. Mathematical discoveries is this idea that, you know, proof is not enough for math. Uh, in fact, before you kind of start proving something, you don't know where you want to start, so you will try to construct some interesting examples. This can be usually, say, sequences, right? If you want to understand the property of a sequence, you will write out a few of the first terms. This can also be graph. So if you want to, you know, figure out what the graph that you're looking for, um, should have, say, a certain property, uh, then you will start by doing some simpler version of the graph. Now, constructions cannot be done by Lean, so we believe in having AI for math discovery, and we have, you know, one of the OGs in that field, François Charton, um, member of technical staff at, at Axiom, and he previously have done pattern boost and end-to-end, you know, set

out disprove a 30-year-old conjecture by finding a counterexample, um, found the solution to a 130-year-old problem, the global Lebanon function, that is a kind of mathematical object showing up in the three-body problem. So we, we are, we are thinking that, you know, it, it's... mathematical discovery tools should be open to the math community, so we are open sourcing entire code bases for that.

Discovery22:12

RJ Haneke22:33

So d- discovery meaning it gives... it makes new conjectures, or it-

Carina Hong22:38

That's a-

RJ Haneke22:39

Or-

Carina Hong22:39

Yeah, it's a pre-conjecturing step, actually.

RJ Haneke22:41

Okay. Oh, I see.

Carina Hong22:41

Yeah. So you, you start to form intuitions.

RJ Haneke22:44

Oh. Mm-hmm.

Carina Hong22:44

Right? If you're a mathematician and your goal is to solve a really hard, hard conjecture, Axiom Prover can't just solve it for you. Um, you might want to try to formulate some sort of lemmas, conjectures that you want to, say, then give to Axiom Prover. Um, if you're a human mathematician, you will start by wanting to formulate that conjecture. You don't know where to go. You want to find constructions. Now, uh, the code bases that we're gonna open source gonna help you, uh, hopefully significantly.

RJ Haneke23:15

So one thing that maybe there's a lot of computer scientists listening, and one of the things that will immediately kinda come up in w- especially when you're talking about formal verification and so forth, is Rice's theorem and decidability and incompleteness theorem and, and maybe, um, some arguments about computational complexity and LLMs. So I, I'm curious to hear, Rice's theorem says you cannot prove non-trivial things about programs for all programs, right? So how, how are you navigating this space? Obviously, formal verification, you know, does-- is able to do some things.

Carina Hong23:54

Yeah.

RJ Haneke23:55

Yeah.

Carina Hong23:55

So yeah, I think like it's, it's very clear that you just, like there's theoretical result telling you you cannot formally verify all programs, right?

RJ Haneke24:02

Yeah.

Carina Hong24:02

But you, you... I think it, it's good to f- formally verify a majority of the useful programs, right? So, you know, like I remember, uh, there is this MIT, uh, like little, like- Documentary, or not a documentary, like an advertisement for, you know, uh, people who are admitted students. And then there's this famous line by Tim, the, the beaver, the mascot of MIT, saying that, "What does theory give you?" Which is, which is kind of like, it doesn't stop us from-

RJ Haneke24:30

Yeah

Carina Hong24:30

... trying to push it as much as possible.

RJ Haneke24:32

Mm, yeah.

Carina Hong24:32

So the goal that we have for the future is suppose you are, you know, doing, doing the, the cr- coding. You want to vibe code a really complex task. So, you know, currently it's front-end websites, but in the future, we might want to vibe code much more complicated things, whole distributed systems even. Then we want to be able to, say, decompose it, and there's maybe a high level, kind of like sketch plan this we can make, other people can make. But say, you know, you have Claude give you like, you know, kind of break it down into 10 things, and at one point it will decide to call Axiom, and Axiom will give you a computer program that you know is formally verified, or it will say, "This is still too hard for us."

Code Limits25:12

RJ Haneke25:15

So you, you write the program?

Carina Hong25:17

Yeah.

RJ Haneke25:17

You give it to Axiom?

Carina Hong25:18

Yeah.

RJ Haneke25:18

It makes changes to it maybe?

Carina Hong25:20

So, so we're talking about kind of two, um, sort of phases.

RJ Haneke25:24

Mm-hmm.

Carina Hong25:25

Um, it is possible that we are the verification partner.

RJ Haneke25:28

Yeah.

Carina Hong25:28

So you already have a computer program-

RJ Haneke25:30

Okay, yeah

Carina Hong25:31

... and you want us to verify it.

RJ Haneke25:32

Mm-hmm.

Carina Hong25:33

In fact, like, you know, GPT found a proof to an unsolved Erdős problem, and our competitor, Harmonic, you know, Aristotle, um, you know, verified it. But we, we can do, we want to do-

RJ Haneke25:44

Both of these

Carina Hong25:45

... verify generation, right?

RJ Haneke25:46

Yeah.

Carina Hong25:46

We might want to say, "Hey, you know, this little component, everything that we generate and provide for you, um, is, is formally verified."

RJ Haneke25:53

I see. So, so the idea would be you, you generate, you co-gen- generate bo- both, and so that-- And I can imagine this fitting into, um, you know, the idea of a promise or a sorry, sorry, and then a sorry. Which is- ... lean.

Brandon Anderson26:08

A lean sorry.

RJ Haneke26:09

A lean sorry, meaning it's a lemma that is unproven, but you're just taking it as given until you can take, have the time to prove it, right? Is that a good way to think about a sorry?

Carina Hong26:20

That is a good way to think about a sorry, but not necessarily in the coding context.

RJ Haneke26:22

It's, so, i- I can imagine you're, you can say, "I- assuming that this module is verified, then-"

Carina Hong26:31

Okay.

RJ Haneke26:31

"... this module is correct." And, and so that, that you can decompose a problem small enough that you can verify. Is this kind of the intuition here?

Carina Hong26:39

So, so, so let's say we want to, you know, like, uh, vibe code control flows.

RJ Haneke26:44

Yeah.

Carina Hong26:45

Right?

RJ Haneke26:45

Uh-huh.

Carina Hong26:46

That's quite hard. You will likely, you know, break that down into multiple steps.

RJ Haneke26:50

Mm-hmm.

Carina Hong26:51

And then it will continue to break down these steps into more fine grain steps.

RJ Haneke26:54

Yeah.

Carina Hong26:55

And at one point, you want something that is absolutely correct.

RJ Haneke26:58

Yeah.

Carina Hong26:59

And then this is also something that is likely within reach. Then we want to generate, you know, both, uh, we want to generate a piece of computer program, and underlying is a guarantee that there is also the, uh, proof that has been generated, which tells you that the thing that you specified, this, you know, program can solve for you.

RJ Haneke27:21

Yeah.

Carina Hong27:22

So, so, so the vision we have is anything that can be-- Which anything is, you know, um, it's a little bit marketing because, because as you said, this theoretical bound. But, but mostly, um, w- well, um, almost surely, hopefully, um, anything that can be defined can be executed. Anything that can be specified can be proven. So the way I think about it is if you have a, a program, um, times a, a, you know, a program times a, times a statement or problem, it maps to verifiability conditions times a proof. So-

RJ Haneke27:57

Yeah

Carina Hong27:57

... while the programming veri- program verification community has given you, say, the verifiability conditions, and we're trying to kind of recruit a really strong team to help us do that, Axiom Prover is gonna give you the proof.

RJ Haneke28:11

So just help me map from the program to the proof. Because like-

Carina Hong28:16

Yeah

RJ Haneke28:16

... I could say, you know, this two-line lean program-

Carina Hong28:20

Yeah

RJ Haneke28:20

... verifies, you know, sort of like whatever, whatever I claim it solves-

Carina Hong28:25

Yeah

RJ Haneke28:25

... how do I know that it actually-

Carina Hong28:27

Yeah

RJ Haneke28:27

... verifies the thing that I think it verifies?

Carina Hong28:29

Yeah. So, so for example, there's this, currently there's this benchmark called Code Marina. It's a, a code verification benchmark, um, that's supposed to be lean-friendly. And so, you know, every problem is a coding problem, and the goal is to generate, there's a code part and there's a proof part.

RJ Haneke28:46

Yeah.

Carina Hong28:46

Two, two different computer programs.

RJ Haneke28:48

Yeah.

Carina Hong28:48

And then the goal is to generate code with proof. So, you know, the code that supposedly solved this problem, and then the proof that this program indeed does solve the problem.

RJ Haneke28:59

I see. Yeah.

Carina Hong28:59

W- Now, now, how do people do on this benchmark? Uh, I kind of want to, like, talk about this a little bit 'cause it's interesting. It was, um, rolled out, I think, by, um, Berkeley and Meta researchers in twenty twenty-five, and they found, I think, whatever version of GPT they evaluated does, like, pass one, like, three point six percent, iterative something like twenty-two percent. Now, you know, how does the formal math systems models do? Um, Copra, which is a, a system, because in a system you iterate and define, so pass one doesn't quite work, but still they evaluate it. Pass one of the system about, like, I think eleven, twelve percent. And then also DeepSeek Prover and Godot Prover, very strong prover model, eleven, twelve percent. And I think our competitor has released last year, uh, on the only proof part, ninety-six percent. Um, and we actually recently, with no modification to the put in system, we saw a ninety-nine percent out of the hundred and eighty-nine problems. We solved a hundred and eighty-seven. We missed only two, um, code with proof. So if you, if you want to train something to do code with proof, and you want to do reinforcement learning, it's actually quite annoying because, look, it's, it's mixed. If you want proof to be informal math, it's, it's very annoying because then that's, like, just mixed objective function. Um, your code is something like Python. Your proof is, say, natural language

math proof. Um, you will not have a strong RL kind of- Performance, right? But if you have proof as lean and you have, you know, code, you can choose Rust, which is a strongly-typed language. It's, it's more, it's more conversion, so you're gonna have much better performance.

RJ Haneke30:36

I can't wrap my head around how do I tie... So I, like, I can say that this proof solves Fermat's Last Theorem, right?

Carina Hong30:44

Yeah, yeah, yeah.

RJ Haneke30:44

But I don't know that, like-

Carina Hong30:45

Yeah, yeah

RJ Haneke30:46

... uh, but it's two lines in lean.

Carina Hong30:47

Yeah.

RJ Haneke30:47

Obviously, it doesn't. So how do I know that the program that I wrote matches the proof that I generated?

Carina Hong30:55

You will basically look at the coding problem, and you look at the, the program, and then you, um, like, try to see if it satisfies the verifiability conditions.

RJ Haneke31:06

But, like, how do I know, right? Like, if I read it-

Carina Hong31:09

Right

RJ Haneke31:09

... um, you know, like, I can, I can just, like, eyeball it, and I can say-

Carina Hong31:13

Yeah, yeah, yeah, yeah

RJ Haneke31:13

... and then, like, g- traditionally, how mathematicians have done this is they, they, you know, they take the paper and they read it, and they say, "I agree that this proof solves the problem."

Carina Hong31:23

Yeah.

RJ Haneke31:23

And then this other person says, "No, wait, it doesn't for, you know, like, look at this," and then people disagree, and eventually, there's consensus that, that, like, this proof solves this problem.

Carina Hong31:34

Yeah.

RJ Haneke31:35

So, like, how do, how are you crossing-

Carina Hong31:36

But, but you check it step by step, right?

RJ Haneke31:38

Yeah. Right. Right.

Carina Hong31:39

Yeah. Yeah. So you basically will look at the verifiability conditions and see if it does actually satisfy that.

RJ Haneke31:46

The-

Carina Hong31:46

So, so suppose, suppose, like, we're looking at, like, you know, a piece of computer program.

RJ Haneke31:50

Yeah.

Carina Hong31:50

Right? And then whether it does actually solve the coding problem, you will have a judgment-

RJ Haneke31:56

Mm-hmm

Carina Hong31:56

... about that, right?

RJ Haneke31:57

Yeah.

Carina Hong31:57

So you will not solely rely on testing, even though that is a way. That's why you-

RJ Haneke32:01

Oh, so somebody looks at the proof-

Carina Hong32:03

Right

RJ Haneke32:03

... and says, "Yeah, that actually solves the problem that we think it's supposed to solve."

Carina Hong32:07

Yeah. But then-

RJ Haneke32:07

Okay

Carina Hong32:07

... but then now you're, you're basically producing a, you know, formal verification program-

RJ Haneke32:13

Yeah

Carina Hong32:13

... that satisfies the verifiability conditions-

RJ Haneke32:17

Yeah

Carina Hong32:17

... about this program and this statement. So again, the function is taking you from the program and the statement to verifiability conditions and proof.

RJ Haneke32:28

Okay, so I can see how this works in a benchmark. Then w- if I have, let's say I have a, a flight control system, right?

Carina Hong32:35

Yeah

RJ Haneke32:35

... that is, like, very complex-

Carina Hong32:36

Then the, the problem becomes very annoyingly, uh, you know, the, the, the, like, specification. I think the word is gonna, you know, even if we say successful, like, like, anything that, you know, that we will have a specification problem.

RJ Haneke32:49

Yeah.

Carina Hong32:49

So, like, here comes a bank saying that, like, "Please, do I have a really safe financial audit?"

RJ Haneke32:57

Yeah.

Carina Hong32:57

Sorry, like, "Prove the financial audit for me," right?

RJ Haneke32:59

Yeah.

Carina Hong33:00

Like, what does that mean? Like, we, we can't specify. Hu- humans are bad at specifying everything that we want.

RJ Haneke33:06

Right.

Carina Hong33:07

There's always, like, some sort of thing that we are not specified, and if it's not specified, it's not proven.

RJ Haneke33:13

Okay, so what do you do about that?

Carina Hong33:15

Yeah, so we're not there yet.

RJ Haneke33:16

Okay.

Carina Hong33:17

Currently, uh, you know, like, again, the, the, the vision as of currently is-

RJ Haneke33:21

Yeah

Carina Hong33:21

... anything that can be specified can be proven.

RJ Haneke33:23

Okay. Yeah.

Carina Hong33:24

Now, obviously, there are people who have been really good at... You know, that's where, maybe where that's is informal kind of reason they're coming.

RJ Haneke33:31

Okay.

Carina Hong33:31

Right? The informal reasoner can... And, and this is, I want to kind of, you know, call the literature of testing. Like, testing are great because testing is like, "Hey, have you thought about that?" Right? Like, like, I want to highlight the work mutation-based, you know, LM unit test generation by, uh, ex-MCTO Shubho, and he was a, a director at Facebook AI Research. Like, the, the way you kind of think about it is, like, the AI will be like, "Hey, have you thought about, have you thought about this, this, this case?" Like, and so this is a little bit like conjecture.

RJ Haneke33:57

Mm-hmm.

Carina Hong33:57

So the conjecture is going to help with the specification.

RJ Haneke34:01

I see.

Carina Hong34:01

And then the prover does the proof.

RJ Haneke34:03

And so this is an interactive process maybe that the-

Carina Hong34:06

Yeah

RJ Haneke34:06

... person, so that when we're actually giving good specifications-

Carina Hong34:09

I think this is the future of coding. Yes

RJ Haneke34:10

... yeah. Oh, interesting. Yeah.

Carina Hong34:11

Yes, I think this is the future of coding. And I think this is where, you know, this is where I think even if we are supposed, like, given the assumption that everything can be formally verified, you know, like, studying sort of like, you know, automatic task generation is still interesting because it, it is basically giving you the specification proposal.

RJ Haneke34:30

Yeah.

Carina Hong34:30

Right? And then another thing is, let's talk about autoformalization, right? Which is the ability to, to define it. It is kind of conver- uh, converting something that is more, more informal, uh, into a, into something that is more formal, autoformalization. Um, so suppose I have a coding problem that is written for ICPC, and this problem is written in English, like Alice and Bob, blah, blah, blah. Okay, now I want to convert that into a formal statement, like a formal spec. How do I do the autoformalization step, right? Now, this is going to be h- because I have not solved the problem yet, so I don't have any signal, I don't have any grounding. The test cases, input/output pair, is gonna ground my-

RJ Haneke35:13

Yeah

Carina Hong35:13

... formal spec.

RJ Haneke35:14

So I know I have to know I'm going to give this input, I'm gonna give this output. It has to have these characteristics.

Carina Hong35:20

Yeah.

RJ Haneke35:20

And so, and so I write test cases, and I write a... So is there a equivalent in lean of this, right? Where the specification, where you just know the sort of, like, outcomes that you are expecting, so that you, like, you, the statement of the, the result and then the, but the proof is completely unproven?

Carina Hong35:41

So lean is actually quite annoying because it's like, a lot of the times, it's proof.

RJ Haneke35:45

Mm.

Carina Hong35:45

So you don't actually have the numerical answers to ground it.

RJ Haneke35:48

Okay.

Carina Hong35:49

So autoformalization is quite, quite a hard thing to do.

RJ Haneke35:52

Yeah.

Carina Hong35:52

Um, because, you know, what's generally happened is, um, you can't, you just, it's hard to ground the autoformalization of a statement. You can obviously ground the autoformalization of a proof, but because you can then just run it. But you need human to eyeball it.

RJ Haneke36:06

How big is a lean proof of like, a formalized, you know, of a formalized program of significant size? Like, I mean-

Carina Hong36:14

So, so-

RJ Haneke36:14

... do they grow with the size of the program, or do they grow super linearly?

Carina Hong36:18

Yeah. Currently, actually, you know, for each line of code written, there could be, like, 20 lines of proof

RJ Haneke36:23

Okay

Carina Hong36:24

It's not looking that great.

RJ Haneke36:25

But, but is that like a linear relationship, or is it as the complexity of the program gets greater than it, like it, you know, sort of also grows so that it's like-

Carina Hong36:34

I don't have a good answer-

RJ Haneke36:35

... 40 pounds-

Carina Hong36:35

I don't have a good answer to the scaling law of that.

RJ Haneke36:38

Oh, okay.

Carina Hong36:38

Yeah.

RJ Haneke36:39

'Cause I know that that's a problem in formal verification, right?

Carina Hong36:41

That's right. That's right.

RJ Haneke36:42

Right, where you have these huge pro- like, you have to have these very, very-

Carina Hong36:46

Yeah

RJ Haneke36:46

... long proofs-

Carina Hong36:47

Yeah

RJ Haneke36:47

... for even simple programs.

Carina Hong36:49

Yeah, yeah.

RJ Haneke36:49

So then, then do you-- are you gonna run into sort of like limitations in, in the capabilities of LLMs when you start to get to large, larger, um-

Carina Hong37:01

What, what we believe fundamentally is we are building a reasoning engine.

RJ Haneke37:07

Mm-hmm.

Carina Hong37:08

And we have seen Axiom Prover deal with really huge trees that are, like, you know, tr-tree of a proof.

RJ Haneke37:17

Okay.

Carina Hong37:18

Um, we have seen it scale from forty nodes to four thousand nodes.

RJ Haneke37:21

So wait, sorry. Axiom Prover is the, is the LLM?

Carina Hong37:25

Axiom Prover is a ensemble system-

RJ Haneke37:27

Oh, ensemble system

Carina Hong37:27

... of multiple-

RJ Haneke37:28

Okay, yeah

Carina Hong37:28

... models-

RJ Haneke37:29

Yeah

Carina Hong37:29

... that we do post-training.

RJ Haneke37:30

I see. Okay, yeah.

Carina Hong37:31

And also, it also includes, obviously, the tools that-

RJ Haneke37:34

Yeah

Carina Hong37:34

... Axel that we have-

RJ Haneke37:35

Yeah, okay

Carina Hong37:35

... um-

RJ Haneke37:36

Right

Carina Hong37:36

... open released.

RJ Haneke37:37

Sorry.

Carina Hong37:37

Yeah, no worries.

RJ Haneke37:38

Clean it up.

Carina Hong37:38

Yeah. So, so we have seen it being able to deal with more and more complex task.

RJ Haneke37:43

I see.

Carina Hong37:44

We don't think it's perfectly bound. You, you could ask, you know, is it bounded at one point on the pre-trained base model?

RJ Haneke37:51

Yeah.

Carina Hong37:52

I think that's a good question. I think, you know, mid-training could be very interesting because it does actually, you know, a lot of the sort of capability gain does come from that part, right? If you could argue that even if you, uh, try to reinforcement learn some, uh, person who is not very talented, uh, that person might behave, you know, be, be, be-- perform a lot less well than an on, on, on post-trained Ramanujan. Uh, you can, you can, you can argue that very, very sad reality of things. But, um, so at one point, we might consider doing, doing that.

Context37:57

RJ Haneke38:22

Then-

Carina Hong38:23

But we think there's so much to push on the-

RJ Haneke38:25

So you, you just feel like there's so much overhead right now or, or so, so much, um-

Carina Hong38:30

Space, uh, to-

RJ Haneke38:30

Space to grow-

Carina Hong38:32

Yeah

RJ Haneke38:32

... that, that you're not running into theoretical-

Carina Hong38:34

Yeah

RJ Haneke38:34

... constraints at this point.

Carina Hong38:36

Yeah.

RJ Haneke38:36

I, I just wonder because, you know, there's been recent results in the computational complexity of the problems that LLMs can solve fundamentally.

Carina Hong38:44

Yeah.

RJ Haneke38:44

And I don't think that they're really a concern for, y-you know, when I'm writing code with Claude Code.

Carina Hong38:51

Yeah.

RJ Haneke38:51

But I can imagine problems becoming big enough in a system like this where you have a gazillion lines of lean.

Carina Hong38:57

Yeah.

RJ Haneke38:57

You can't get 'em, get 'em into the context window, so you have to, like, be smart about that.

Carina Hong39:01

Yeah.

RJ Haneke39:01

And then you have to summarize, and then you're summarizing and summarizing, and-

Carina Hong39:05

Yeah

RJ Haneke39:05

... pretty soon you're, like, kinda losing track of what's going on. And I-- it just seems like with a lar-very large system like that, you might run into-

Carina Hong39:13

Yeah, I think this is, this is interesting. It's always a problem of abundance. So suppose you just, like, keep really the, the mathematical discovery. Renaissance has come. Axiom Prover does try to prove everything. You end up with, like, tens of thousand lines of lean proof. So first of all, is auto informalization is a lot easier than auto formalization minus a problem of no grounding, right? So, you know, every, every model has seen a lot of text and a lot of lean. So you can always, you know, convert that lean back into, back into informal, and then there's the problem of, well, how do you know if you're correct or not? You can rely on cyclic, like, consistency, so you then formalize it again and, like, prove, like, program equivalence, something like that. So that's-

RJ Haneke39:53

Oh, so you, you like informalize and then formalize-

Carina Hong39:55

Yeah, you can use it to ground

RJ Haneke39:57

... and you, like, make sure that you still look correct.

Carina Hong39:57

You can use it-- yeah, yeah. Like, and auto informalization-

RJ Haneke40:00

Oh, interesting

Carina Hong40:00

... is, you know, obviously less harder problem.

RJ Haneke40:03

Yes.

Carina Hong40:03

So you can always do that. So for a lot of the, you know, the, the lean code that we output, we can have an informal summarizer of, of like big chunks of lean. It's actually doing okay. So, you know, that's, that's a, that's a thing. And there's have another question of like, which I think is very interesting is I think there's a panel at, um, ICML Vancouver last year, um, at AI for Math workshop. There's like Leo De Moura and Java-Ja-Jer-Jeremy Aligod and, um, Shubo and CTO was there. And they were talking about like, will, will humans or mathematicians at some point stop trying to understand what's going on there, right? Because like-

RJ Haneke40:41

Mm-hmm. Yeah

Carina Hong40:42

... suppose you're a really ambitious mathematician. You're like, "I want to prove my hypothesis." And bang, here's a lean proof. And like, it's actually correct. And it's just like, you know, problem, one million lines. Um-

RJ Haneke40:53

But, yeah, isn't that like a big negative for the community? Because, I mean, usually when someone comes up with a big proof of something, um, oftentimes it will-

Carina Hong41:00

Yeah, I was about to get-

RJ Haneke41:01

The process as it is massive.

Carina Hong41:02

Yeah, I was about to get there, right? It's like, well, will, will that negative outcome happen-

RJ Haneke41:06

Yeah

Carina Hong41:06

... was the question the panel was discussing. It's a hy-completely hypothetical.

RJ Haneke41:09

Mm-hmm.

Carina Hong41:09

No one's, no one's like, you know, model system can, can prove my hypothesis, right? So, so disclaimer, please, please don't cut that part . Just send that along.

RJ Haneke41:16

Yeah.

Carina Hong41:16

Um, but like, you know, um, will, will people still trying to, try to understand what's going on? And I think the answer is usually, it's, it's always yes. I think curiosity and the, the desire to understand what is going on, you know, um, mathematically or in other domains as well, is a basic human need. And I think that is like, I think, a dose of optimism in an era of, I think, verified super intelligence, suppose we get there. Is that even, even if all the outputs are going to be produced and at a much, you know, faster pace and much more exponential volume compared to what humans could possibly consume, they're still going to try to consume it. And if they're still going to try to consume the ones that they deem important, so then basically attention is the bottleneck. And if attention is the bottleneck, then really intuition and taste, uh, you know, of which statement is probably worth con-worth the consumption of human and also maybe in a finite compute resource, worth, worth the consumption, uh, worth, worth the, the sort of spending of compute resources. That's where human mathematicians' taste will always guide us, and I think that's incredibly beautiful.

RJ Haneke42:24

Is it worth, like, internally taking, like, results you can prove one way and then trying to send your system at many different routes to get, like, or like orthogonal, conceptually orthogonal proofs, and so you kind of get a diverse set of different ways of You reasoning about the same thing because, you know, I think it could be very valuable if you give it a problem to say, "Oh, well, like here's kind of the brute force natural way that like maybe some humans would do it." Um, and then the, uh, there's like a really much shorter elegant way of doing it. So have you essentially thought about training your models to be elegant in some way?

Carina Hong43:05

Yeah. At one point, we're gonna get to there because, you know, I think the conjecture, uh, will probably depend on what, what, you know, will probably de-de-depend on what we mean by taste, elegance. Feels like an alignment problem to me, you know? Like, you know, who, who gets to say what is elegant? Humans get to say what is elegant.

RJ Haneke43:22

Yeah.

Carina Hong43:23

Right? That's-

RJ Haneke43:23

That's, that's, I mean, fundamental, right?

Carina Hong43:24

Human preference.

RJ Haneke43:24

And there's something about hard work, right?

Carina Hong43:27

Yeah.

RJ Haneke43:27

That what-

Carina Hong43:28

Yeah

RJ Haneke43:28

... what you work on hard is what you're gonna be good at.

Carina Hong43:31

Yeah, yeah.

RJ Haneke43:31

Right?

Carina Hong43:31

And we're gonna have a problem about that, I think-

RJ Haneke43:33

Yeah

Carina Hong43:33

... like pretty much in, in a lot of the domains as well-

RJ Haneke43:35

Yeah

Carina Hong43:35

... right? Not just math. Like-

RJ Haneke43:37

Yeah, absolutely

Carina Hong43:37

... how do you be that senior, um, programmer with, you know, really good high-level understanding, well, I guess full stack understanding, high level and low level-

RJ Haneke43:48

Mm-hmm

Carina Hong43:48

... if you haven't spent the year of training?

RJ Haneke43:51

I mean, I would argue that you don't... This is very phil-philosophical, but like y- you know, I, I don't need to be good at assembly language programming, right? Like no- not many people are good at that. A few people are because it's important for their job.

Business43:57

Carina Hong44:04

It's not experience, but curiosity.

RJ Haneke44:05

Yeah. So but, but it feels to me a little different because not-

Carina Hong44:10

I see

RJ Haneke44:10

... being good at like proving things, for example, right? That seems like a fundamental gap in like, uh, that maybe my mind doesn't develop in the same way if I am not doing that. Whereas if I'm just not good at assembly language program well, but I'm good at like higher level programming, so maybe that doesn't matter.

Carina Hong44:29

I think that's probably because how the, maybe how the education system, the pipeline works, which is that if you do not show early signs of brilliance, you don't sometimes go through the process of pre-training-

RJ Haneke44:44

Carina Hong44:45

... in math.

RJ Haneke44:46

Yeah, yeah.

Carina Hong44:47

Right? Like, so, so that maybe you can argue that you don't need to, say, you know, learn everything to develop a sense of taste, but there's like a threshold you kind of need to meet.

RJ Haneke45:00

Yeah.

Carina Hong45:00

So for example, you probably need to be able to code even if you don't need to understand assembly-

RJ Haneke45:08

Yes

Carina Hong45:08

... language, and that thing might transfer my intuition or, you know, my, my intuition might transfer from the Olympiad math problems into some other research areas, and I, I tried to pursue, and combinatorics transfer is more direct, um, s- very similar, and number theory could be further, but still okay. And then when it gets to like something that's a lot more different than Olympiad math, the transfer is less strong. But kind of like, you know, you need to diligent, as you said, right? Like you, you need to diligently go through some amount of training.

RJ Haneke45:41

Yeah.

Carina Hong45:41

And if people over rely on strong AI, that doesn't happen.

RJ Haneke45:46

I wanna s-switch gears.

Carina Hong45:47

Yeah.

RJ Haneke45:48

You, you mentioned, uh, software verification. What are the domains? How are you gonna make, uh, enough money to v- justify the valuation? The like... And congratulations by the way.

Carina Hong45:58

Thank you. Yeah, yeah, yeah.

RJ Haneke46:00

How... So what, what's the-- Give us the, the high level summary of like w-what is the, what is the vision that you show, you put in front of investors about why does this actually make a lot of money?

Carina Hong46:10

Yeah. So, um, first of all, this round is kind of preemptive, so it's, uh, I think a lot of the investors have pretty high interest about, about Axiom. Um, in terms of kind of what we believe in, we believe the future of coding is going to be somewhat constrained by verification capability.

RJ Haneke46:28

Mm-hmm.

Carina Hong46:29

And we believe in solving formal math is a very natural starting point, and then by extension, you can increase the verification capability across hardware and software. And for hardware, for example, that's quite revolutionary. I mean, that is, there is no, as we know, there's no partial credit for a mostly verified GPU.

RJ Haneke46:49

No.

Carina Hong46:50

Uh,

RJ Haneke46:52

You can tell from that-

Carina Hong46:52

It's all or nothing. It is all or nothing. And you do, and you do need a perfect prover. Like, I want to stretch that, which is, which, stress this point, which is that suppose I am a, you know, I'm, uh, someone who loves solving math. I think there are a lot of Twitter users who enjoy Pokemon, like hunting, um, or those problems. And then I just try to, um, you know, use a, uh, a non-deterministic LLM, uh, like GPT, say, to try to get the full proof for that.

RJ Haneke47:23

Yeah.

Carina Hong47:23

Now, I can do that many, many times, and I might succeed, and I might not, and I might not have a problem with whether I actually succeed or not. This absolutely does not work for hardware verification.

RJ Haneke47:37

Mm.

Carina Hong47:37

So for those kind of domains, which I call like hardcore verification need, it, it is a pain point. It is a current pain point. There are, there are, there are hundreds of humans and thousands of licenses being dedicated to solve one local grid problem verification.

RJ Haneke47:53

Just as an aside, the, my understanding is that the industry standard for design to verification-

Carina Hong47:58

Yeah

RJ Haneke47:59

... in, uh, ASIC, ASIC project is like-

Carina Hong48:02

Yeah

RJ Haneke48:02

... one to three, one to four.

Carina Hong48:03

One to three, one to four. Correct. Both in, say, team size and in duration.

RJ Haneke48:07

Yeah.

Carina Hong48:07

Right. So if you multiply that-

RJ Haneke48:09

Yeah.

Carina Hong48:09

Uh, yeah, square. And then I think-- So, so that's, that's, uh, I would say like, you know, it's a, it's a must cover. And now for software verification, it is, it is interesting, right? Because, you know, as probably we all realize, like my nephew web codes a lovable website. There is absolutely no need to formally verify that piece of code. Like, why would you? Now, I heard a, a story from Kmat, actually that New York Times reporter who, um, uh, told me the story, which is like-

RJ Haneke48:42

Yeah

Carina Hong48:43

However, if you think about like, you know, in the time of agents, like my OpenClaw can probably do all sorts of things and probably can do some bad things. Uh, like my OpenClaw can decide to like text something bad to my professor, right? Like-

RJ Haneke48:56

Yeah, yeah

Carina Hong48:56

And, and, and you can say that perhaps is that a problem of formal verification? Probably still not, right? You can change something about the action space and make it more limited, so you don't, you don't need to rely on formal verification. So you can have a lot of cases, but you can think about, you know, maybe an enterprise that is dealing with a lot of regulatory kind of stuff using agents. They might want to do something like it- it- it's their choice. But I will argue that the improvement of verification capability, both in latency, you know, and in accuracy, all this stuff, the performance holistically is going to determine whether people rely on formal verification or not.

RJ Haneke49:32

Sure.

Carina Hong49:32

So in a way, we want to make it so good that basically we can make that a choice. And-

RJ Haneke49:38

So, so why did the investors think that you could do this, right? 'Cause I mean, people have been working on verification for so long, and I think everyone agrees it's an important problem. And it-- and I think certainly if I can just have a verification proof for every program that I write, like, "Hey, Claude, like give me the proof also," and then it just produces it and, "Oh, yep, looks good to me," I would absolutely do that. But so why is it-- what was it that the investors saw, in your opinion, that persuaded them that, "Okay, this is the moment I'm gonna put in my two hundred million or whatever"?

Carina Hong50:11

I think, uh, when it comes to faith-

RJ Haneke50:14

Yeah

Carina Hong50:14

... you either have it or you don't. So you either dream the dream with us or you don't.

RJ Haneke50:18

Mm-hmm.

Carina Hong50:18

And that's okay, because when we realize the dream, the company's gonna be worth ten billion.

RJ Haneke50:22

Yeah.

Carina Hong50:22

So I think that's kind of the, the feeling that I have, which is that we believe verification is the critical, critical part to superintelligence.

RJ Haneke50:32

Mm-hmm.

Carina Hong50:32

Our version of superintelligence is absolutely verified. We don't think there's any other possible future. We do not believe that, I'm gonna say it on the record, we do not believe that an informal math system is going to be the math AGI solution.

RJ Haneke50:50

Why not?

Carina Hong50:52

We just don't believe that.

RJ Haneke50:53

I mean, w- the counterargument is, "Oh, you know, like we just do a lot of good RL," and, you know, we've seen, uh, GPT, you know, solving, you know, I think some murderous problem and like whatever. So why do you think that that runs out of gas?

Carina Hong51:10

Yeah. So you can say that if you are a frontier math, then you have like... So, so frontier lab, and you have like infinite resources, why does that-- there's this, it's by definition, no running out of gas, right? If you think like infinite means like there's no running out of gas. I don't think it's going to scale to superintelligence.

RJ Haneke51:28

So you think that you run out, like you run out of money, basically, or run out of power from nuclear power plants or something?

Carina Hong51:32

So, so we as a startup, first of all, cannot do that. We first of all, as a startup, cannot do that.

RJ Haneke51:36

Yeah.

Carina Hong51:37

But we generally think that formal math and by sort of converting math proofs to pr- programs, to code, give us much better performance.

RJ Haneke51:47

So, so it's just your-- it's your sample efficiency argument and so forth that you just... and maybe you just can't, that, that you can't bend the curve enough if you don't use formal verification.

Carina Hong51:56

The thing is, the, the thing is the informal stuff is also available to us in a way. If you really-- like you can have a both informal and formal system, and that is going to be-

RJ Haneke52:08

I see. I see

Carina Hong52:09

... very strong. I-- the thing that I kind of like, I think my, my suspicion about like, you know, whether we can scale to mass AGI just by the informal approach is you're going to keep having, you know, the LMS judges solution, or you have human experts who grade, and just human experts like doesn't scale that well. And then if you really argue infinite, infinity, then sure, then you also have infinite money and you can pay infinite... It-- There, there's so many... I- is there really infinite number of people who can understand and prove at, say like about like, you know, a result, a non-trivial result in Langlands program? I think, you know, good luck finding those p- people. And in fact, I think how frontier maths came to- came, came together is because they couldn't assemble a, a, a benchmark by... They, they are expert poor, so they have to, you know, collaborate with Epoch to do it, right? And, and I think that's kind of what, what I worry about, about kind of having the human part. So they have LMS judges, and then now stoch- stochastic judging. The problem is like whether something is impossible to achieve versus something is incredibly expensive and like really incredibly expensive and incredibly expensive to achieve get kind of like mixed in the end.

RJ Haneke53:27

And then of course, investors always wanna know why you, right? So I've read a little bit about your background, and I think it-- we would do a disservice to the audience-

Carina Hong53:37

Of course

RJ Haneke53:37

... if we didn't hear a little bit just about your personal story.

Carina Hong53:40

I see.

RJ Haneke53:40

Do you wanna talk just a little bit-

Carina Hong53:42

Sure

RJ Haneke53:42

... about like you've, you've done some really interesting stuff, so I'd, we'd-- I'd love to hear like you and then your team.

Carina Hong53:49

Yeah.

RJ Haneke53:49

What, what, what makes Axiom special?

Carina Hong53:51

Yeah. I think Axiom is like very special because they are really expert mathematicians. Basically, they are users of the system we are developing, and that iteration loop is very fast. It is extremely, it is extremely fast. You have like some of the strongest, you know, mathematicians and both in research and Olympiad contest, and you also have people who are, um, you know, Mathlib contributors, maintainers, developers, um, lingurus, really, and, uh, combine them with people who come from like applied ML, um, you know, really strong organizations like Meta Fair, um, and Golden Age Fair, um, as well as people who have codegen expertise who work with like compilers like KernelGen, um, have kind of these backgrounds of people together. I think that sort of interdisciplinary way of thinking about things quite, quite helpful. We think AI for math has traditionally been quite interdisciplinary. People are borrowing techniques from even AI for science. People are tech-- um, bor- borrowing tech- techniques from, from the codegen literature, and people are borrowing techniques from obviously the broader like, you know, frontier like applied ML, um, to try to apply on the niche problem of AI for math. So we also think having this sort of very special, special team is a, is a differentiation. Uh, we also think that, you know, as you say, there's no, no permanent mode. Um, the proprietary data that we generate, um, and a little bit of a firewall we are seeing is a

time mode. Well, me personally, uh, I, I love math. I think, you know, I, I kind of have been doing math since I was very young and Like, maths sometimes gets really hard when, when the problem you are solving are, are just a little bit out of reach, and it gets a bit depressing. And times to times, I wonder if I can just have an AI help me. Uh, and, uh, and yeah, I think why-- I figure why not build such a thing?

Background55:27

RJ Haneke55:44

Y- you did a master's at Oxford-

Carina Hong55:46

Yeah

RJ Haneke55:47

... in neuroscience.

Carina Hong55:48

Yeah.

RJ Haneke55:48

Has that informed your thinking here?

Carina Hong55:51

That's a great question. I think, like, my, my, my, you know, experience with neuroscience is y- you learn very well about what's hard-

And what's impossible. I mean, it's, it's very interesting. I think that year of neuroscience, like, gave me some feelings about what's hard and almost no feeling about what might work, so.

RJ Haneke56:14

Okay.

Carina Hong56:15

But I think I was kind of under the pretense of neuroscience, like, hanging out at the UCL Gatsby Institute and was fortunate to do AI research with some really cool faculties. And so I think that was a very productive year of AI study-

RJ Haneke56:27

So you were-

Carina Hong56:27

And non-neuro study.

RJ Haneke56:28

You-- So you're-- It was mostly for you studying AI.

Carina Hong56:32

That's right. That's right. I think in the UK, if you, um, back, back in the, you know, um, 20th century, if you call something AI, you will not get the donation, but if you call something brai- brain science, you might have a chance.

RJ Haneke56:43

I see.

Carina Hong56:43

So, so the UCL Gatsby, which is a premier AI hub where a lot of people actually go, um, you know, from there to DeepMind, including, uh, Demis himself, uh, it's a very wonderful research environment. Uh, I remember those kind of, like, tea time talks were very amazing and, and people were basically just doing AI. Uh, it's called the Gatsby Computational Neuroscience Institute. Yeah, I think how, how that kind of, you know, happened was because-- So I was, I was in the Master of Neuroscience program, and then, um, quickly realized that you need to, like, kill rats and, um, kind of don't want to do that, and computational neuroscience sounds more appealing and, and when you look at the project and you see, like, transformer, you're like, "Ah, you absolutely want to do that."

RJ Haneke57:26

Yeah. Hmm. We're, we're all excited about that. So, so after the Gatsby, uh, you started a math PhD program at Stanford?

Carina Hong57:34

I started actually one year full, full-time at the law school-

RJ Haneke57:38

Oh, okay

Carina Hong57:38

... because the JD PhD program's structured in a way where you have to spend full-- one full residency year. So that was also a very fun year, um, of learning things that, like, are just quite fascinating, like criminal law, looking at homicide cases. Exciting. No, um, and-

RJ Haneke57:53

Do you ever feel like the legal system is under- or overspecified in some way that maybe you could, um, you could axiomatize-

Carina Hong58:01

That's a great question

RJ Haneke58:02

... and improve?

Carina Hong58:02

That's a, that's a great question. I think for a lot of things-

RJ Haneke58:05

Mm-hmm

Carina Hong58:05

... it's definitely underspecified. Um, for some other things, I was actually quite excited about sort of transfer learning from mathematical reasoning to those specific fields. I think appellate litigation, the legal gymnastics, you see some really good appellate scholars and lawyers that just come from maths training. Not many, but, like, Laurence Tribe for one, you know, Harvard law professor, um, one of the, you know, strongest, like, um, you know, s- appellate litigation and SCOTUS briefs, like, brains on, on the left, uh, Democratic Party. Um, and, uh, uh, and I think there's a lot of other domains such as antitrust that's incredibly flow charty. Contract law, sometimes also flow charty. Bankruptcy, tax, uh, more on the corporate side. I, I, I just love litigation stuff. Like, I, I mean, yeah.

RJ Haneke58:54

So, so actually, I do just-

Carina Hong58:56

Yeah

RJ Haneke58:56

... just because we're talking about litigation, it's not the same thing, but there was a, there was a Erdos problem that, that, uh, Axiom-- So I don't re- know if it was Axiom Prover or whatever. Is that right? There was a controversy about it because it had represented that it had solved the problem when in fact the proof had been-

Carina Hong59:15

Yeah

RJ Haneke59:15

... it had discovered-

Carina Hong59:16

Yeah

RJ Haneke59:16

... the proof and then just formalized it.

Carina Hong59:18

Yeah. So actually, what happened was-

RJ Haneke59:20

Carina Hong59:20

... our competitor, Harmonic, decided to publicize that they have solved, uh, unsolved problems, um, Erdos number, uh, one two four and four eight one, and then we trusted their literature review, believing that these problems are really truly unsolved. And we were a really young company at the time. We wanted to test if our system can attempt to try the problems that our competitor can. We fully did not expect that it actually solved them, but, um, turns out that we're both wrong, that in fact, the problem has been solved before.

RJ Haneke59:49

I see. So then-

Carina Hong59:49

It's not the only time that we relied on others' literature res- uh, res-re, uh, literature search, and, you know, we, we should own it. Um, the other time was this paper called "Dead Ends in Square-Free Walks." Um, you know, Professor, um, uh, uh, Professor Miller, um, have this problem that actually turns out to have been solved. But, um, we, we, I mean, we really should have done our part. That is, that is, you know.

RJ Haneke1:00:13

The point I'm trying to maybe elicit is, um, not n- not like you guys did something wrong, but rather like how does-

Carina Hong1:00:20

You know, there's this, like, Japanese, like, um, advertisement of, like, a whole company, like hundreds and thousands of people, like, apologizing, uh- ... in the, in the advertisement, and it's like, you know, "Sorry we raised our price by, like, five cents." And that's the advertisement. And I was like thinking that maybe I should just do that. It's so-

RJ Haneke1:00:37

Oh, no

Carina Hong1:00:37

... it's so embarrassing.

RJ Haneke1:00:38

No, uh, no, but I think the, the question of provenance of information and sort of like how do you-- It goes back to the question I was asking before about, like, how do I-- how am I connecting the answer to the question?

Carina Hong1:00:51

Yeah. That's a great question. I think after the Erdos thing, we're, like, extremely careful, and so we kind of, like, you know, we didn't really look at the other Erdos problems. I believe that Harmonic still continue to claim they have solved Erdos problems that might, might, might not. I don't know. Uh, it's, you know, there's a, I think Terence Tao and a lot of other people have a database about all the Erdos problems and the status. I think, you know, like, it is really, by the way, like, it's a really easy mistake to make because there are so many Erdos problems that actually have been solved, right?

Erdos1:00:57

RJ Haneke1:01:21

Yeah.

Carina Hong1:01:21

And I think that that's kind of... I- indeed, I think, like, you know, search and retrieval is a, is a, is a hard problem. Like, you don't know if that argument or an equivalent version of that. In fact, I think the most interesting part about that entire database is, um, there are a lot of problems that are not directly solve, solved, but can be just an very easy extension, almost a trivial extension of another result that has been solved or not, sometimes not even resolved. Sometimes I think in this, um, dead-end square-free walks case, which is nothing to do with Harmonic, completely Axiom's fault, um, that we actually didn't realize, and then Professor, um, Kannan Soundararajan actually pointed to us and to Professor Miller, is that it was actually from a Stack Mass Overflow or Stack Overflow- Post

RJ Haneke1:02:06

Carina Hong1:02:06

Um, like a user pointed out that there is a nineteen thirty-six results. It's fascinating.

RJ Haneke1:02:12

Yeah.

Carina Hong1:02:12

I think it's hard to, hard to find out. Now, why search is a hard problem?

RJ Haneke1:02:16

I guess that means that you do-- Does, does the, the conjecture engine or whatever, does that, does that use search as part of its process, or is that something that you kinda, you, you, uh, the human does and then feeds-

Carina Hong1:02:30

I think, I think knowledge graph or knowledge base is a very, you know, important component of any-

RJ Haneke1:02:37

Okay

Carina Hong1:02:38

... any company.

RJ Haneke1:02:39

Yeah.

Carina Hong1:02:39

Uh, and I think, I don't think it's talked about enough.

RJ Haneke1:02:42

And so, and, and you guys, with that, it sounds like you don't wanna give us too many details, but like, so you guys have a knowledge graph. I mean, I, that brings up also, I, I read somewhere that you guys have a really massive database of lean proofs that you've generated, so synthetic data in, in some sense.

Carina Hong1:03:00

Yeah.

RJ Haneke1:03:00

But that the... And, and this may- maybe is a competitive advantage for you.

Carina Hong1:03:05

I think, I think everyone is trying to accumulate, like, a data, which is not a mode, it's just time and time mode.

RJ Haneke1:03:11

Yeah.

Carina Hong1:03:11

It's, yeah, it's all, it's all, it's all about, like, you know, whether you can execute fast enough, um, to make sure that, like, you have, like, a certain buffer, um, because of, say, your data set, you know, accumulation, but that is only just a buffer.

RJ Haneke1:03:25

Have you ever thought about doing something like an Alpha Zero for math, where you start from nothing and let it just make up axioms and see what happens?

Carina Hong1:03:35

Ah, this is a wonderful question. I think that's a very interesting approach, actually. Yeah, I think we, we believe in something which is that, like, you know, suppose, um, Axiom Prover can be a really strong mathematician, and-

RJ Haneke1:03:44

Mm-hmm

Carina Hong1:03:45

... then really the, the thing that it is proving every day should hopefully help it improve, right?

RJ Haneke1:03:50

Yes.

Carina Hong1:03:50

I think this sort of self-improvement design is extremely, uh, valuable.

RJ Haneke1:03:54

Mm-hmm.

Carina Hong1:03:54

Um, and I think there are, um, other people in the AI for math community. I think, uh, Professor Gabriel Peyré's work is very interesting. Um, I think there are some of the, um, kind of more conjecturing type of exploration. Um, suppose we just kind of change, um, you know, a lot of the... Th-there are specific things you can do in, in certain, in certain ways that, that can try to see if your system can learn to conjecture and build theories.

RJ Haneke1:04:22

I think that the, the topic is really interesting and important because it really... Y-you're claiming that the, t-to get to super intelligence-

Carina Hong1:04:34

Mm-hmm

RJ Haneke1:04:35

... there's sort of this, like, it's just not gonna be possible. Maybe if you had infinite resources, you could just RL it, and it would work, maybe. But it, the reality is, is that you just can't be sample efficient enough or whatever it is to do that, so that you need some sort of verifier in the loop with the inference process rather than-- Because you do have verifiers in, like, sort of during the training process, and you just don't have them during the inference process. Is that-

Carina Hong1:05:04

Yeah, I think a lot of them are just secretly, like, trying to use this to ground their reasoning-

RJ Haneke1:05:11

Yes

Carina Hong1:05:11

... as well.

RJ Haneke1:05:11

I mean, I would. I was surprised that, that, like, when, when o1 was, you know, everyone knew o1 was coming, but it did- hadn't come out, I was sure they're gonna announce that they're using Lean to, to do, like, formal verification of proofs and ge- actually generate proofs and then verify them so that they're grounding and reasoning. I mean, that was my initial-

Carina Hong1:05:33

When Emil was there, there was GPTF. That was a great piece of work. There's also MiniF2F. These are all formal math work at OpenAI.

RJ Haneke1:05:41

Okay. So presumably, those guys are doing something.

Carina Hong1:05:45

No, no, they all left.

RJ Haneke1:05:46

Oh, they all left. I see.

Carina Hong1:05:47

So that's my point, which is that if you are, like, you know, an intern, I guess you can be an intern forever. So let's say you are like a junior, you know, like member of technical staff, and you want to work on something for, like, as long as it takes to solve it. Weirdly, people think about startup as this sort of your runway can just run out, and it can just, like, all fall apart thing. You might have a better chance of staying focused on the same problem-

Self-Improve1:06:02

RJ Haneke1:06:12

Yeah

Carina Hong1:06:13

... for as long as it takes-

RJ Haneke1:06:14

Mm-hmm

Carina Hong1:06:15

... at, say, a startup like Axiom or one of the other new labs.

RJ Haneke1:06:19

Yeah. If you're aligned to the mission of the-

Carina Hong1:06:21

Than big tech

RJ Haneke1:06:22

... company rather than, like, somebody decided that what you're doing is no longer-

Carina Hong1:06:26

Yeah, yeah

RJ Haneke1:06:26

... of any interest to them.

Carina Hong1:06:27

It can be your VP lost some political fight and so, yeah.

RJ Haneke1:06:31

Yeah. Absolutely. So-

Carina Hong1:06:33

Now, obviously, if we succeed, then they're all gonna, you know, start doing that again.

RJ Haneke1:06:37

Yes.

Carina Hong1:06:38

And then, like, I guess as a talent, then there are more, like, you know, potential places to choose from as well.

RJ Haneke1:06:42

Yeah. So th-then your job is to go fast so that they, they're, they're struggling. So actually, uh, um, you, um, we haven't talked about it, but you actually also just released an, um, an API for-

Carina Hong1:06:56

Yeah

RJ Haneke1:06:56

... doing Lean verification.

Carina Hong1:06:58

Yeah.

RJ Haneke1:06:58

And, um, I actually tried it with Cloud Code, um, because it's easier than setting up, uh, you know, your own Lean, um, tool chain.

Carina Hong1:07:07

Yeah.

RJ Haneke1:07:07

Um, and, you know, like, tried to get Lean to prove some stuff.

Carina Hong1:07:10

Yeah.

RJ Haneke1:07:11

Um, and The infrastructure is, is maybe non-trivial, um, especially at scale. So you wanna talk a little bit about-

Carina Hong1:07:18

Yeah, yeah. So we just released Axle, A-X-L-E, uh, stands for Axiom Lean Engine. And, uh, it's really a set of kind of proof validation and manipulation tools that are built for Lean in the language of Lean. So, uh, it's a bunch of meta programming tools. Now, meta programming talents are, are extremely, I think, like, you know, hard to find, and, uh, we're so grateful to have, like, a really crack team working on that. Uh, and we want to kind of like release it to the community for, uh, to use for free because we think that there are probably other people doing also, like, large scale Lean operations, and, uh, this tool is gonna make their stuff go a bu-- a lot more robust and faster and do so at scale. Um, and Axle is currently, I think, fourteen, uh, like, such tools, uh, starting from verify proof, which is the sort of to make sure that there's not-- nothing weird, um, you know, going on, like, no, no sort of cheating by, by Lean code. You don't axiom something out. You know, we don't, we don't assume weird things. If you axiom N plus N equals N, you can prove two, two-- prove two plus two equals two, which you're, you know, for sure not, that's not the right answer. Um, there are also like, you know, a lot of other kind of generation tools. Um, for example, you can try like different repair attempts.

So c-- you know, broken Lean in and then good Lean out. Uh, and you know, there are like currently, you know, other repair methods by LLM, so hopefully this, what we provide can be just a lot cheaper and, uh, more kind of, you know, straightforward. And it's just, you know, s-- I think strong, strong and better engineering can, can get you to, to a place that's quite far. A lot of the people from the Lean community has been using Axle, even if it's just been a week, to do all sorts of different interesting things. We have seen, uh, people from the kind of blockchain community use it to, to do interesting things, et cetera. And we have seen also, we have heard from a lot of the people that Claude plus Axle is kind of their go-to setup, um, for, for now. Um, we think that these are really interesting tools. I think famously, I think, uh, today there's this mathematician who said he formalized the Donald News, you know, using Claude to prove, I think, a result, um, a Ramsey, Ramsey result, and then to formalize the, the, the Lean proof, and then, um, that is also using Axle tool. So we are really glad to see people kind of already using it.

RJ Haneke1:09:34

I mean, I feel like this is a great opportunity for the collaborations that Terence Tao was talking about as well, where once people have access to the common tools, then it becomes easy to do. And I mean-

Carina Hong1:09:47

Yeah

RJ Haneke1:09:47

... like if, if you have a intuition, even, uh, y- not a strong ma-mathematician like myself, you might be able to participate in the, you know, sort of like an effort to prove a larger theorem or something like that.

Carina Hong1:10:00

Yeah, I think that's, that's very interesting, like, view, which is that like if you think about like mathematics has been not like as collaborative as software engineering. You don't have like hundreds and thousands of people working on something together. I think Polymath was an instance when, when that happened that was fantastic. So if you have a lot of really good sort of setup, indeed like commoditized kind of access, then people can all participate in. In fact, that's how I think some of the large formalization projects have been done. Uh, things are divided into sub-task. But really the blueprint writing process by, say, Terence Tao and Alex Kontorovich of assigning the task to different people and how things kind of fit together, that blueprint writing part is extremely important. And there has been, I think, a result about sphere packing, I think, by, um, one of the other companies out there. And the blueprint part for, um, the A dimension is still pretty much built on what the sphere packing, uh, community, the Lean community, the humans blueprint, and similar with some of their other results as well. The blueprint part has still been human-generated, and I think auto-generated blueprint is going to be a technical bottleneck that many people are trying to solve around the same time.

RJ Haneke1:11:09

So is there value in me as a, you know, Cloud Code user trying t-to attempt part, like some small lemma or whatever, um, w-where I don't have a great understanding of the math? Maybe I have a high-level understanding.

Carina Hong1:11:23

Depends. Are you trying to formalize or are you trying to prove?

RJ Haneke1:11:26

Uh-

Carina Hong1:11:27

To prove new things

RJ Haneke1:11:27

... that's a good point. Yeah.

Carina Hong1:11:28

Yeah.

RJ Haneke1:11:29

So maybe form-- You would obviously probably start with formalization, right?

Carina Hong1:11:32

Yeah, yeah.

RJ Haneke1:11:33

You already know the proof and-

Carina Hong1:11:34

Yeah

RJ Haneke1:11:34

... you just can't get-

Carina Hong1:11:35

Yeah

RJ Haneke1:11:35

... nobody has been-

Carina Hong1:11:36

That's right

RJ Haneke1:11:36

... able to get the formalization correct.

Carina Hong1:11:38

That's right.

RJ Haneke1:11:38

Yeah.

Carina Hong1:11:38

I do actually have seen people use Lean in formalization, and they try to do it by hand, you know, not using any AI as a way to learn mathematics.

RJ Haneke1:11:45

Mm.

Carina Hong1:11:46

No. It's, you know, it's all the formalization.

RJ Haneke1:11:48

That sounds very hard.

Carina Hong1:11:48

You don't have that process. Well, it's interesting because I think a lot of the, uh, my friends who started, you know, working on Lean and Mathlib was because they are in PhD, and this problem is really hard, and we get stuck all the time, and we want to kind of review some of the undergrad classes, a time where we still understand what the math was about, and we do so by, you know, doing Lean. And I think that's, that's very beautiful.

RJ Haneke1:12:14

So formalizing the material.

Carina Hong1:12:14

Yeah. But if you have, for example, like, you know, um, access to axiom prover that also can formalize all the formalized things, then you don't have-- you lose that part of the Lean learning process.

RJ Haneke1:12:25

Yeah.

Carina Hong1:12:25

Yeah. But I do think that, you know, like for, for, for, you know, you and I, we can set up like Axle and try to like see, you know, what, what results we might be able to prove. And I think that's quite interesting. And thanks to Axle sort of making the speed a lot faster, you don't have to wait very long.

RJ Haneke1:12:42

Yeah.

Carina Hong1:12:42

I remember the Putnam exam day. We were all like in the war room. It was a Saturday. We're all really excited, and we just got the exam paper from the official, um, like organizing the proctor of the Putnam exam. We just like were looking at like how, like how much workout Axle is like getting. And without it, we couldn't have solved it with, um, I think the eight problems was in the time limit. That definitely not, not within the time limit. And I think one thing about these tools is like it's very interesting in that- Potentially, you can have interesting reward for RL as well.

RJ Haneke1:13:16

What do you mean by that?

Axle & Vision1:13:17

Carina Hong1:13:17

So for example, verify proof can be a, a reward for just basically a proof is completely correct and validated.

RJ Haneke1:13:24

I see.

Carina Hong1:13:24

I think formal verification tooling can be interesting direction to pursue with RL.

RJ Haneke1:13:31

Yeah. So you mean, um, for example-

Carina Hong1:13:34

Code grading and stuff

RJ Haneke1:13:35

... you, so you formalize-- auto-formalize the informal proof and then verify what-- and then use that as a reward? Or do you mean-

Carina Hong1:13:43

No, as in like you pass like Lean programs-

RJ Haneke1:13:46

Okay. Yeah

Carina Hong1:13:46

... in these formal tools.

RJ Haneke1:13:48

Yeah.

Carina Hong1:13:48

Right? Like, and you will have some sort of score.

RJ Haneke1:13:53

Okay. Yeah. I think if I were to build o1 or something, I would've, in my mind, I would've used what I just described. But you're saying just to learn how to do Lean itself.

Carina Hong1:14:03

So, so I-- the, the value proposition, which is interesting about Frontier Lab, is that suppose you are a two C business, then sure, you, you can just not do what we are doing. And we have seen, for example, DeepSeek, uh, like originally having a formal team and then later dissolve that team because of strategic direction change. That's all completely reasonable. Now, suppose you are focused on coding, right, and you have talent who want to work on what we are doing, it makes a lot more sense for you to do code generation, further your strength and moat.

RJ Haneke1:14:41

Yeah.

Carina Hong1:14:41

You can partner with Axiom, um, just like how, for example, Frontier Labs, um, partner with, uh, startups that work on search, such as Ixra and Parallel, right?

RJ Haneke1:14:51

Yeah.

Carina Hong1:14:51

Just call Ixra API for search and-

RJ Haneke1:14:53

Yeah

Carina Hong1:14:53

... potentially, you know, if you're a Frontier Lab, I think you should call Axiom API for verification.

RJ Haneke1:14:59

Yes.

Carina Hong1:14:59

Um, better composition versus-

RJ Haneke1:15:01

Setting up your own infrastructure.

Carina Hong1:15:02

It doesn't make sense. I mean, it just, you know, potentially, um, I think the talent, the, the finickiness of Lean, um, the sort of data code, like, like, you know, there's, there's no reason to.

RJ Haneke1:15:15

Yeah, I mean, it took me five minutes to set up. Why did you decide to start Axiom?

Carina Hong1:15:22

Right. Why did I decide to?

RJ Haneke1:15:23

Like, you were a grad student at Stanford-

Carina Hong1:15:25

Yeah

RJ Haneke1:15:25

... and, you know, in math.

Carina Hong1:15:27

Yeah.

RJ Haneke1:15:27

So what made you decide to-

Carina Hong1:15:29

I wasn't in math for very long. I was-

RJ Haneke1:15:31

Yeah

Carina Hong1:15:31

... I was, uh, m-- I think like almost as soon as I started the PhD-

RJ Haneke1:15:35

Mm-hmm

Carina Hong1:15:36

... I just started fundraising. So it wasn't like-

RJ Haneke1:15:38

Oh, really?

Carina Hong1:15:38

Yeah.

RJ Haneke1:15:39

Okay.

Carina Hong1:15:39

Yeah.

RJ Haneke1:15:39

Was that the plan? Or did you, did you start there, and you, like, almost immediately realized that this is-

Carina Hong1:15:44

Right, right

RJ Haneke1:15:44

... the direction?

Carina Hong1:15:44

So, so the year of law school, right-

RJ Haneke1:15:47

Mm-hmm

Carina Hong1:15:47

... it was very, very interesting to me, like-

RJ Haneke1:15:48

Yeah

Carina Hong1:15:48

... on an intellectual level.

RJ Haneke1:15:50

Mm-hmm.

Carina Hong1:15:50

But it's also the s- the first year where I had no science, technology or math-

RJ Haneke1:15:54

Mm-hmm

Carina Hong1:15:54

... whatsoever in my life.

RJ Haneke1:15:55

Mm-hmm.

Carina Hong1:15:56

It's a weird year, right? Like I'm, I'm reading a lot. I'm, I'm practicing... Well, I'm learning how to write, I'm learning how to read, like.

RJ Haneke1:16:02

Mm-hmm.

Carina Hong1:16:03

And but, like, I'm, I'm just kind of, I want to like be obsessed about something in technology. Like that was-

RJ Haneke1:16:10

Mm-hmm

Carina Hong1:16:10

... also what's going on that year. So yeah, the year of law school, right?

RJ Haneke1:16:14

Yes.

Carina Hong1:16:14

And, and it was very, very interesting to me because it's like, okay, like I just, I need to be obsessed with like a technical thing, 'cause otherwise I get too... I don't think I'm bored, 'cause I really love like everything about, about law. I really, really loved it. It was, it was something that's incredibly interesting to study. Um, but I just, I, I mean, I've been basically like, you know, very excited about like the progress of reasoning. I was looking at a lot of the post-training kind of papers. I was, I was learning all of these, like just by myself. Um, and then at one point, it got to a point where I'm like, "I think this is for sure happening." And like, I think talking to Shubho, right, at, at Verve, like every weekend also, like it didn't help, like soothing this thought. So I got more and more obsessed. And at a point I'm like, "Okay, if I'm doing this like literally every minute and I can't think about something else, like what, what..." You know, I need to do something about it. I mean, it's like you, you... I, I fall madly in love with the idea that AI is gonna do math. And like, okay, now do I, do I do math? Like I... It, it's really, really crazy, like at a time where I remember the obsession was quite, I just couldn't get out of it. And then, um, I

went to this Nye Hennessy event, uh, this Nye Hennessy Scholar dining house. It's like, hosts all sorts of like free lunch events, and those are great because you get free food and you get interesting intellectual exposure to things. And I remember Julie Zhuo, who was I think a Facebook, um, first Facebook PM, came to speak. And then after that, I just like, basically walked up to her and I said like, "Uh, like what do you do if you want to do a startup and you really wanted to do academia because you, you kind of love math?" And then she's like, "Well, you know, what's your time spent on these two different things?" And I'm like, "A hundred percent, zero percent." And then she's like, "Well, you kind of have to follow your energy."

RJ Haneke1:18:02

Yeah, I mean, if you're, if you are, um, completely obsessed with it.

Carina Hong1:18:06

Yeah, I was completely obsessed with it. I thought it's gonna be big.

RJ Haneke1:18:09

Mm-hmm.

Carina Hong1:18:09

And I thought like, it, it, it just, it just has to be a for-profit startup because like, it, it's so much broader than making mathematical breakthroughs.

RJ Haneke1:18:20

Hmm.

Carina Hong1:18:20

If you think about like recursive self-improvement and like really the, the kind of more high level like concept of like you really want to have just AI, AI scientists, like the math reasoning is gonna be, is gonna be a pretty big part of it.

RJ Haneke1:18:38

Yeah.

Carina Hong1:18:38

And now trying, like I think the, the, the sort of belief by, by Cursor and Claude and other folks is like, okay, like just like math transfer to coding code, coding transfer to math as well. I think that's true. It's just that like, you know, why, why not push it directly? I don't-

RJ Haneke1:18:51

Mm-hmm

Carina Hong1:18:52

... I don't get it. You need to push that directly. And then there's this other like, you know, thought which is that, and maybe kind of going back to the collaboration point, right? Um, verification has traditionally been thought of as, okay, well, there are some industry where there's a lot of guardrails, so if you're working in defense, military use, okay, you need to like basically satisfy a lot of barriers to entry to meet those stringent, like requirements. So it's, it's something that's... Verification is for the industries that are closed. But it's for the first time now I think verified AI is to open up collaboration, either as human-AI collaboration. Well, before blueprinting, that's human, human collaboration, and Lean was the grounding, was the verification, formal language. And then human-AI collaboration like we're seeing now, future AI agent, agent, agent like collaboration. So, like, I think verified AI is for openness. It's not for meeting the requirements of closed industries. And I think, just like I think verification should not be about, oh, I remember, like, you know, there was this article, like, chatbots make stuff up. Is AI the solution to-- Sorry, is math solution to hallucination? Uh, verification to me is not about lousiness. Verification to me is about scaling brilliance, compounding brilliance. It's like just kind of going back to the collaboration point. It's about Ramanujan being a much stronger mathematician. He was already a really strong one, but verification helps him extend

the brilliance, like both kind of like scale up and scale out. So verification-

RJ Haneke1:20:15

It's more rigorous.

Carina Hong1:20:16

Verification to me is not about, you know, like erasing the mistakes and the lousiness, about scaling brilliance. And, and the third point is that, like, verification to me is, um, not about, like, the sort of, you know, just talking about rigor. It's actually about performance gain, right? It's not just about the stringent requirements, the hurdles that you need to overcome. It is about, like, actual verified generation's gonna make it so much better. And I think, like, kind of these three points, I think the last point is that a lot of the people think that you work on verification because of your distrust for technology. Like, it sells really well to, I think, the general pub-public, including, like, my parents. Like, "Oh, well, we're doing verification because, like, you know, technology make mistakes. It's..." No, we don't think verification is based on-- is because of the distrust for technology. It's because that's what, like, um, expected rapid exponential scale up and, and, um, the deployment and the creation of technology and technological progress is what that compels and demands.

RJ Haneke1:21:12

It's a very mathematical perspective, right?

Carina Hong1:21:14

Yeah.

RJ Haneke1:21:14

Because you're saying proofs are, proofs are drive math, right? A lot of math is based-

Carina Hong1:21:20

Yeah

RJ Haneke1:21:20

...is, is about proofs.

Carina Hong1:21:21

Yeah.

RJ Haneke1:21:21

And math drives a lot of science and innovation in the world, and the innovations in math drive innovation in the world.

Carina Hong1:21:28

Yeah.

RJ Haneke1:21:28

And so that, that-

Carina Hong1:21:29

But it doesn't need to even go through, like in, in terms of, you know, the solve-- math solve everything thing like obviously stands. Like, my point is like transfer learning doesn't like... Transfer learning is about like pushing math reas-math reasoning. It just... So, so there are kind of, I guess, like there are a, a couple narratives here. Like for some people it's like you, you solve math, and then math are the, you know, fundamentals of sciences. So that's actually the, from AI for math, like take this radical layer of AI for science, it's that narrative. We actually believe in just like general tr-transfer learning.

RJ Haneke1:22:00

Yeah.

Carina Hong1:22:00

Like, I think Axiom is, Axiom is on the infrastructure stack.

RJ Haneke1:22:03

And you think that this is just a first step to, you know, basically unlocking capabilities in many domains, in science and law, for example.

Carina Hong1:22:14

Yes. I, I think it's... So, so again, there are like, you know, multiple, multiple kind of like beliefs. One belief is that there's math and there is like, you know, formal, the power of formal verification.

RJ Haneke1:22:26

Mm-hmm.

Carina Hong1:22:27

Suppose we actually, you know, solve math and have a really strong informal math reasoning engine, we do not expect that term to be as large as solving math through the formal way.

RJ Haneke1:22:42

Why?

Carina Hong1:22:44

I mean, code as, as it is language-

RJ Haneke1:22:47

Yeah

Carina Hong1:22:47

...but it is indeed on the more structured end.

RJ Haneke1:22:50

Yes.

Carina Hong1:22:50

It bridges informal and formal.

RJ Haneke1:22:52

Yes.

Carina Hong1:22:52

What we are doing is it's not informal versus formal, right? We're not taking the sort of like completely formal of a proof approach. Like it's im- it's bridging be- between informal and formal. It is bridging between high level and low level. It is a direct, it's sort of like a, a direct improvement to reasoning, um, through transfer, like transfer learning. And it's also an indirect in that like, okay, well, like math is gonna unlock a little science and sure.

RJ Haneke1:23:16

I see.

Carina Hong1:23:16

Um, and that is really what we're seeing.

RJ Haneke1:23:18

So you think that it enables transfer learning?

Carina Hong1:23:21

Yeah.

RJ Haneke1:23:21

I see.

Carina Hong1:23:22

I think that is, that is pretty much a consensus. I think it is a consensus, and this is a bet that has been pretty much kind of overlooked by others because math sounds pure, and it doesn't sound like there's any commercial value.

RJ Haneke1:23:33

Mm.

Carina Hong1:23:34

Well, I do ob-obviously understand the opportun- like the opportunity cost if you're like a really like a frontier lab of, of solving this problem. But I definitely think this is a problem that if you're like a well-resourced startup, you should be doing.

RJ Haneke1:23:44

That's an interesting-

Carina Hong1:23:45

Yeah

RJ Haneke1:23:46

...perspective. Did you get everything out that you wanted to say?

Carina Hong1:23:48

Yeah, I think, I think it's, um, like, you know-

RJ Haneke1:23:50

Yeah

Carina Hong1:23:50

...like the question of like, is Axiom Math or is Axiom Ver-Verification? The DNA of the company is math. We think that's, uh, verification's the best first market.

RJ Haneke1:23:58

Yeah.

Carina Hong1:23:58

And we think that sort of like solving math, and especially like formal math, um, i-is going to like help us like tackle the really, uh, ambitious quest of verified AI.

RJ Haneke1:24:09

Yeah, yeah.

Carina Hong1:24:09

Now, when we are done with that, we might have other, the second markets, including AI for science that we just talked about. But, but on the theoretical layer, right? Like I think real world testing is important, and potentially we can stay in the digital world and, and soft-software stuff and for other things to be, to be, to be getting real world, like physical world signals.

RJ Haneke1:24:28

But do you think that, that, that the, the sort of, the capability of doing really powerful reasoning-

Carina Hong1:24:35

RJ Haneke1:24:35

...once you have that powerful verified reasoning engine-

Carina Hong1:24:40

Yeah

RJ Haneke1:24:40

...that that's the moment when, okay, now we, we've unlocked that for, you know, software verification and hardware or whatever.

Carina Hong1:24:48

Yeah.

RJ Haneke1:24:48

But now, okay, so now what about biology? What about chemistry?

Carina Hong1:24:51

So that could be one.

RJ Haneke1:24:53

Yeah.

Carina Hong1:24:53

The other one is then, like, really how far are you to recursive self-improvement?

RJ Haneke1:24:57

Okay, so just AGI.

Carina Hong1:24:59

Yeah. I think there is this sort of question, and different people, because of their probably different backgrounds, have different, um... It's, it's really where your energy and your passion leads you. Like, for some people, actually, I have heard this actually, you know, with my friends, they want to work on AGI because they believe solve AGI, solve death. There are other people who come from a more, like, medicine background, they really believe they can solve death, and they don't solve AGI and then solve death.

RJ Haneke1:25:22

Yeah.

Carina Hong1:25:22

They just solve, like, AI for science.

RJ Haneke1:25:24

Yeah.

Carina Hong1:25:25

Now, which way is correct? I don't know.

RJ Haneke1:25:28

And so the recursive self-improvement angle, it sounds to me like you're saying that the combination of verification plus the sort of like language, which is informal, it's that combination that enables really good recursive self-improvement.

Carina Hong1:25:43

I think recursive self-improvement is going to happen anyways. We're trying to have, like, formal verification earn its place. So we, we, like, a-again, the, whether, um, formal verification can be welcomed and deployed and become a consensus depends on how well we execute. And I think when you boil down that problem into an execution problem, you should just go for it.

Brandon Anderson1:26:07

What-- Looking forward, what's the biggest bottleneck that you see in the field for both Axiom and maybe just the field abroad in terms of-

Carina Hong1:26:15

Uh, fragmentation. So, um, I think we're in a market where people like to start like, you know, like, a thousand people, they don't join force, they start a thousand things. I think that's actually the biggest, like, kind of bubble indicator. I think there are categorical bubble, and there are, like, other categories where there are moonshots. It's not bubble, it just looks a little bubbly. In the field, if people who are, like, really, like, of really legit, you know, backgrounds decide to join force and work in a team for the mission rather than for ego, for kind of the status neo founder, I think that category is-- I'm, I'm really bullish other... And vice versa. So I think the bottleneck actually is about potentially... I think, I think it's, it's, it's annoying because it's like we are in a-- If, if you believe we are in a, in an age of, um, research, if you believe in, like, deep tasks are the interesting directions we go after, um, the market sort of conditions currently is good and bad, and that good, it enables these sort of long-term, long-horizon bets to be funded. Bad, because there's too much noise in the market and some other, like, irrational players. Um, you know, we, we try to work with really incredible venture firms. Like, they are the partners, they are our intellectual partners, and there's a lot of alignment, and we really bounce, like, very, uh, cool ideas, technical

and t-tech- non-technical each other, like, for long, long hours. And, and, you know, we spend, like, a lot of time off work and weekend together to really intensely build the company. But there are also other people who just want to, like, park, like, capital somewhere. And, you know, while we don't work with them, these encourage, these are market conditions that encourage fragmentation. And when things get fragmented, like, no one gets there. Like, I think every category, regardless of how, how right the idea is, is pretty much in a sort of earning the right to exist stage. And if that is the case, then, um, for example, great deep tech company, SpaceX, and people do actually join force to work on that dream, and potentially, in that case, also a very charismatic founder. I think a really kind of concerning thing for me personally is that for other probably some categories that I'm personally quite bullish about the direction about and just, like, looking at things generally, fragmentation is a problem. Like, the sort of, you know, we see start pooling professors from university to, um, work on something when i-it-it really, it really is a really interesting kind of situation.

Brandon Anderson1:28:45

Maybe this is a naive, naive question, but, like, right now, when you were talking about players in, let's say, AI for math-

Carina Hong1:28:51

Yeah

Brandon Anderson1:28:51

... where, you know, you, Harmonic-

Carina Hong1:28:54

Yeah

Brandon Anderson1:28:54

... and then, you know, the big labs, right?

Carina Hong1:28:56

Yeah.

Brandon Anderson1:28:56

Am I missing someone? Is, like, is that actually fragmented really?

Carina Hong1:29:00

I guess fragmentation, I think, is a bottleneck for the entire, like, AI landscape.

Brandon Anderson1:29:06

Okay. Yeah.

Carina Hong1:29:07

I think AI for math is a category that is actually not a bubble-

Brandon Anderson1:29:12

Mm-hmm

Carina Hong1:29:12

... because it is not fragmented, because people who are really amazing talents do like to join force.

Brandon Anderson1:29:17

Mm-hmm.

Carina Hong1:29:18

So, for example, the fact to get Ken Ono and François Charton on one team-

Brandon Anderson1:29:22

Mm-hmm

Carina Hong1:29:23

... like, this is fantastic. Like, you have someone who's a core contributor of frontier math tier four-

Brandon Anderson1:29:27

Mm-hmm

Carina Hong1:29:27

... really great benchmark-setter, François, who's on the AI for math discovery, proving and discovery. They work together. Then you are suddenly a player with both proving capability and construction capability, and that's fantastic. And I believe, you know, as you said, like, Harmonic probably also have some really great talents, like joining force together. I think AI for math is a, is a good category because of the absence of fragmentation. But even, you know, from, from our perspective-

Brandon Anderson1:29:51

Mm-hmm

Carina Hong1:29:51

... the sort of, for example, um, you know, RL, right, b-being-- I don't, I don't think that's, like, a category per se, but, you know, RL talents currently, um, it's quite hard to, to attract and retain, right, for, for literally everyone. And there are a lot of, um, companies being started, and then sold like three months later, and, and just... The, the-- Each month where you could have worked on, like, a technical problem, and you're instead working on deals, it's a month that is wasted. And I say that, like, you know, also with some amount of pain and suffering-

Brandon Anderson1:30:23

Yeah. Really

Carina Hong1:30:23

... having, having gone through two fundraisers. Yes, yes, yes.

Brandon Anderson1:30:27

Yeah.

RJ Haneke1:30:27

Yeah.

Brandon Anderson1:30:27

So, so what's the biggest bottleneck-

Carina Hong1:30:29

For me

Brandon Anderson1:30:30

... in AI for math?

Carina Hong1:30:30

For, for Axiom, for AI for math.

Brandon Anderson1:30:32

Not Axiom, but just the community as a whole.

Carina Hong1:30:34

The community of AI for math.

Brandon Anderson1:30:35

Yeah. Where is it going? What is the thing that everyone just really wants to break?

Carina Hong1:30:39

I expect fragmentation to start to happen-

Brandon Anderson1:30:41

Mm-hmm

Carina Hong1:30:42

... as Axiom and Harmonic establish category leadership.

Brandon Anderson1:30:44

Mm-hmm.

Carina Hong1:30:45

So I expect people kind of, you know, that, that's one thing. But I also think that another bottleneck could be the pressure of short term versus long term. I think that we are doing things in a very sort of fast-paced manner, but that does not mean we can always, or it does not mean it is always correct to do things in the most fast-paced manner.

Brandon Anderson1:31:09

Mm-hmm.

Carina Hong1:31:10

Like, we did things in a fast-paced manner because, well, we were founded on the day of the International Math Olympiad, so we couldn't have competed in that anyway. The next Math Olympiad is Putnam, and we're quite excited because it's, I mean, it's undergraduate exam. And this year's IMO, twenty twenty-five IMO, was easy on the MOHS scale, and Putnam could be hard. And in fact, it was harder than the IMO and the MOHS scale, if you look at AI, um, you know, how much, how many scores the AI, uh, has, has retained on average and on the max, you know, difficulty of the problem. Putnam is harder in both, both axes. Um, so we want, want to try, and so there's only a gap of four months. But it doesn't, it doesn't mean I'm always gonna set four-month goals. If I build a company only setting four-month goals, I, I might build a really short-sighted company. So there are, like, I think, longer horizon problem. I think, for example, market forces could force other players into chip verification. Well, it is possible that co-verification is a holy grail. It's possible that if you solve that, then you also naturally solve chip verification with some amount of, like, epsilon caveat of, like, distribution shift. But I strongly believe that, like, a bottleneck, like, could be the pressure. But I think that Axiom is fortunate that, one, we're early enough, two, we are, like, a team of just

incredibly, like, um, high-agency people, that our execution generally surpasses expectation. But I think, like, what I think could be a bottleneck for the entire AI for maths field is that potentially trying to prove commercial value is going to distract significantly from the core capability improvement.

RJ Haneke1:32:39

Yeah. That makes sense.

Carina Hong1:32:40

Okay.

RJ Haneke1:32:40

Cool.

Brandon Anderson1:32:41

Thank you for driving up and coming to see us.

Carina Hong1:32:43

Thank you so much. Yeah.

Brandon Anderson1:32:44

I know the traffic was n- was horrible.

Carina Hong1:32:46

Yeah. Thank you.

Brandon Anderson1:32:47

And, and, um, it's been really a pleasure speaking with you, and-

Carina Hong1:32:50

Yeah

Brandon Anderson1:32:51

... um, we look forward to-

Carina Hong1:32:52

It's awesome

Brandon Anderson1:32:52

... to seeing how things develop.

Carina Hong1:32:54

Yeah. Thank you so much.

RJ Haneke1:32:55

Thank you.

Carina Hong1:32:55

Awesome.

Brandon Anderson1:32:55

Thank you.

Carina Hong1:32:56

Thank you. Yeah.

Brandon Anderson1:32:57

Yeah.

Scaling Past Informal AI - Carina Hong, Axiom Math

Topics

Mentioned

Transcript

Intro0:00

Series A0:52

Verified AI4:52

Axiom System13:42

Discovery22:12

Code Limits25:12

Context37:57

Business43:57

Background55:27

Erdos1:00:57

Self-Improve1:06:02

Axle & Vision1:13:17