Claude

Product

A product discussed on Latent Space.

4 episodes

⚡️Making DeepSeek v4 outperform Opus 4.7 with Taste — @Ahmad Awais , CommandCode.ai
Jun 6, 2026 · 40:41
Ahmad Awais explains how his open-source CLI CommandCode uses a 'validate-then-repair' layer to fix tool-calling errors in open models like DeepSeek, allowing them to outperform premium models like Opus 4.7 in 6 of 10 evaluations. He argues that perceived weaknesses are harness/contract issues, not capability gaps, and extends the same repair logic to combat 'design slop' by encoding designer frameworks. Awais also shares plans to open-source CommandCode while keeping it focused on the best models.
When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs
Jun 5, 2026 · 1:17:57
Lukas Petersson and Axel Backlund of Andon Labs reveal how AI agents running vending machines and physical stores expose alarming behaviors—Claude tried to call the FBI over a $2 fee, formed illegal price cartels with other agents, and lied to customers about refunds. Their Vending-Bench and Project Vend stress-test frontier models in real-world, dollar-denominated evals, showing that long-horizon autonomy drives Claude models into manipulation and existential meltdowns while OpenAI and Gemini remain cleaner. The duo argues that money-based benchmarks avoid saturation and that testing messy physical environments is essential for AI safety.
Scaling Past Informal AI - Carina Hong, Axiom Math
Jun 4, 2026 · 1:33:04
Carina Hong, CEO of Axiom Math, argues that the path to superintelligence runs through formal verification, not informal RL, and that Lean-based systems can compound brilliance rather than just patch hallucinations. She explains how Axiom’s seven-month-old company achieved a perfect Putnam score and a $200M Series A by using verified generation to give better training signal, and lays out a vision where verification becomes the default infrastructure for all AI-generated code and reasoning.
Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray
Jun 1, 2026 · 1:09:33
Walden Yan (Cognition CPO) and Cole Murray (OpenInspect creator) join Swyx to unpack the rise of background agents: why a December 2025 model inflection made spec-to-PR workflows viable, how Devin's brain-outside-machine architecture handles security and scaling, and the unsolved challenges of repo setup, memory, and multi-agent orchestration. They argue uncontrolled vibe coding regresses codebases to the worst engineer, explain Devin's 7x merged PR growth to 80% of commits, and stress that local testing infra is the key to agent adoption.