Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He
Jun 2, 2026 · 1:44:43
Ethan He details building xAI's Grok Imagine from zero to one in three months, arguing most visual intelligence gains now come from language models, not diffusion. He explains how small bugs in data pipelines drive quality, why video agents—not just raw model improvements—will unlock production-grade generation by year's end, and how world models must be real-time, interactive, and long-horizon to become the front end of AI.