Anthropic has released Claude Opus 4.8, with three concrete upgrades: improved honesty behaviors, stronger self-verification capabilities, and multi-agent dynamic workflows built for large-scale code tasks. Benchmark scores are closing the gap against OpenAI's GPT-5.5, but the episode digs into why that comparison is more complicated than the leaderboard suggests, with active debate over harness quality and what those numbers actually measure in production.

The business signals surrounding this release matter as much as the model itself. Cognition closed a $1 billion funding round. Kirkland and Ellis, one of the largest law firms in the world, built a half-billion-dollar internal AI platform. Anthropic's own valuation continues to climb alongside a Mythos preview. OpenAI pushed a GPT-5.5 Instant update in the same window. The capital and institutional adoption are accelerating simultaneously.

The full episode is worth watching for the harness quality argument alone. Benchmark gaps narrowing does not mean capability gaps narrowing, and the show makes that case with specifics. The multi-agent workflow details for Opus 4.8 also deserve close attention if you are evaluating models for complex engineering pipelines, where self-verification at scale is the actual bottleneck.

[WATCH ON YOUTUBE →]