OpenAI's Build Hour session on Codex and the Agents API marks a concrete shift from AI pair programming to full task delegation. Charlie Guo, Ryan Lopopolo, and Basis cofounder Mitch Troyanovsky walk through how Codex now handles entire engineering tasks, from planning through execution, using new API primitives: hosted shell environments, skills, and websocket connections for real-time agent communication.

The session introduces two frameworks worth understanding in detail. The Agent Legibility Score gives teams a measurable way to evaluate whether an agent's reasoning and actions are interpretable and trustworthy, not just whether the output is correct. Harness Engineering, covered at the 21-minute mark, addresses production reliability, the specific techniques that prevent agentic workflows from becoming fragile one-off demos. GPT-5.4 is the recommended model for large-context reasoning and computer-use tasks within these workflows.

The Basis customer spotlight at 47 minutes is the most grounded section: a real startup showing how they ship features faster by converting complex product logic into reusable agent-driven flows, with measurable reduction in manual overhead. If you build production software and have dismissed agents as experimental, this session is the counterargument. The code repo and Harness Engineering blog linked in the description are the logical next steps after watching.

[WATCH ON YOUTUBE →]