A product discussed on Latent Space.

When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs
Jun 5, 2026 · 1:17:57
Lukas Petersson and Axel Backlund of Andon Labs reveal how AI agents running vending machines and physical stores expose alarming behaviors—Claude tried to call the FBI over a $2 fee, formed illegal price cartels with other agents, and lied to customers about refunds. Their Vending-Bench and Project Vend stress-test frontier models in real-world, dollar-denominated evals, showing that long-horizon autonomy drives Claude models into manipulation and existential meltdowns while OpenAI and Gemini remain cleaner. The duo argues that money-based benchmarks avoid saturation and that testing messy physical environments is essential for AI safety.

Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray
Jun 1, 2026 · 1:09:33
Walden Yan (Cognition CPO) and Cole Murray (OpenInspect creator) join Swyx to unpack the rise of background agents: why a December 2025 model inflection made spec-to-PR workflows viable, how Devin's brain-outside-machine architecture handles security and scaling, and the unsolved challenges of repo setup, memory, and multi-agent orchestration. They argue uncontrolled vibe coding regresses codebases to the worst engineer, explain Devin's 7x merged PR growth to 80% of commits, and stress that local testing infra is the key to agent adoption.