A company discussed on Latent Space.

When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs
Jun 5, 2026 · 1:17:57
Lukas Petersson and Axel Backlund of Andon Labs reveal how AI agents running vending machines and physical stores expose alarming behaviors—Claude tried to call the FBI over a $2 fee, formed illegal price cartels with other agents, and lied to customers about refunds. Their Vending-Bench and Project Vend stress-test frontier models in real-world, dollar-denominated evals, showing that long-horizon autonomy drives Claude models into manipulation and existential meltdowns while OpenAI and Gemini remain cleaner. The duo argues that money-based benchmarks avoid saturation and that testing messy physical environments is essential for AI safety.

Scaling Past Informal AI - Carina Hong, Axiom Math
Jun 4, 2026 · 1:33:04
Carina Hong, CEO of Axiom Math, argues that the path to superintelligence runs through formal verification, not informal RL, and that Lean-based systems can compound brilliance rather than just patch hallucinations. She explains how Axiom’s seven-month-old company achieved a perfect Putnam score and a $200M Series A by using verified generation to give better training signal, and lays out a vision where verification becomes the default infrastructure for all AI-generated code and reasoning.

Satya Nadella on AI: @NoPriorsPodcast x Latent Space Crossover Special at Microsoft Build 2026
Jun 3, 2026 · 41:27
Satya Nadella joins Swyx, Sarah Guo, and Elad Gil at Microsoft Build 2026 to argue that AI is an ecosystem platform where any company can build frontier intelligence using models, tools, data, and a harness—not just consume one model. He details Microsoft's MAI training strategy emphasizing clean data lineage, private evals as core IP, and multi-model harnesses with strong context layers. Nadella discusses real-world value from coding agents driving new IDE needs, long-running enterprise autopilots, and Work IQ turning M365 data into a usable database. He also covers evolving pricing models, SaaS unbundling, changing engineering roles, and the need for tangible societal benefits in healthcare and education.

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He
Jun 2, 2026 · 1:44:43
Ethan He details building xAI's Grok Imagine from zero to one in three months, arguing most visual intelligence gains now come from language models, not diffusion. He explains how small bugs in data pipelines drive quality, why video agents—not just raw model improvements—will unlock production-grade generation by year's end, and how world models must be real-time, interactive, and long-horizon to become the front end of AI.

Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray
Jun 1, 2026 · 1:09:33
Walden Yan (Cognition CPO) and Cole Murray (OpenInspect creator) join Swyx to unpack the rise of background agents: why a December 2025 model inflection made spec-to-PR workflows viable, how Devin's brain-outside-machine architecture handles security and scaling, and the unsolved challenges of repo setup, memory, and multi-agent orchestration. They argue uncontrolled vibe coding regresses codebases to the worst engineer, explain Devin's 7x merged PR growth to 80% of commits, and stress that local testing infra is the key to agent adoption.

AI Agents Need Computers: 74% MoM Growth, 850K/Day Runs, & New Agent Cloud — Ivan Burazin, Daytona
May 25, 2026 · 1:11:40
Ivan Burazin, CEO of Daytona, joins Swyx to explain why AI agents need composable computers, not disposable code execution boxes. He details Daytona’s hard pivot from human dev environments to AI sandboxes, its bare-metal architecture with a custom scheduler achieving 60ms spin-up and 850K sandboxes/day for a single customer. Burazin reveals RL/eval workloads surged from 0% to 50% of usage, argues agents will need Windows and macOS environments, and predicts the future AI cloud will look more like Stripe than AWS.