GPT Image 2 beat the LM Arena leaderboard by 242 points, the largest margin on record. The episode leads with this benchmark result but immediately pivots to the more consequential question: what does a top-tier image model actually unlock inside an agentic pipeline.

The practical answer is image-to-code workflows. The episode walks through how developers are using GPT Image 2 to convert screenshots, mockups, and visual assets directly into functional code, and where the model still fails at reasoning over image content rather than just reproducing it. That gap between generation and reasoning is the technical tension worth reading for.

Three news items round out the episode: SpaceX signed a deal with Cursor, an unauthorized group gained access to Claude Mythos, and Google upgraded Deep Research. Each story connects back to the same underlying pressure: the agentic stack is moving faster than the access controls, safety reviews, and tooling built around it.

[WATCH ON YOUTUBE →]