An AI researcher at GitHub automated the analysis of coding agent trajectories, hundreds of thousands of lines of JSON logs per day, by building a tool called eval-agents using GitHub Copilot with Claude Opus 4.6 as the underlying model. Five team members then joined the project. In under three days, they produced 11 new agents, 4 new skills, and a net change of +28,858 and -2,884 lines of code across 345 files.
Three principles drove that velocity. First, prompting: conversational, verbose prompts in planning mode outperform terse instructions. Second, architecture: readable code, updated docs, and frequent refactors are not optional maintenance but the core investment that makes agentic contribution fast. Third, iteration: when agents fail, the process is wrong, not the agent. The author also used the Copilot SDK to avoid reinventing tools, MCP servers, and agentic scaffolding from scratch.
The full article is worth reading for two specific things: the concrete prompt example that produced contract-style regression tests Copilot cannot modify, and the explanation of how designing for agent contribution also made the project easier for human collaborators. That overlap is the real finding, and the author earns it through specifics, not claims.
[READ ORIGINAL →]