OpenAI's updated Agents SDK introduces a model-native harness designed for long-running, multi-step agents that operate across files, codebases, and external systems. Presented by Steve Coffey and Nish Singaraju, the session centers on a live demo starting at 14:13: building a task tracker using core primitives including MCP, skills, AGENTS.md, shell execution, and apply patch. The SDK is available in both Python and TypeScript.

The harness matters because it stabilizes agent loops, the part of agentic systems most likely to degrade or hallucinate over extended tasks. Agents run inside controlled sandbox environments with scoped dependencies and tools, which is the engineering answer to the reliability problem in production deployments. These are not toy demos: the primitives covered handle real file inspection, command execution, and code editing.

The full session is worth watching for the Q&A at 37:50, where edge cases and stack integration questions typically surface the details the demo skips. The code repo at github.com/openai/build-hours lets you follow along directly. If you are building anything beyond single-turn tool calls, this SDK update is the current baseline worth understanding.

[WATCH ON YOUTUBE →]