GitHub built Qubot, an internal Copilot-powered analytics agent that lets any GitHub employee query the company's data warehouse in plain language and get results in seconds. It connects to both Kusto and Trino via MCP servers, defaults to Kusto for recent event data, and switches to Trino automatically for complex historical joins. No data analyst required. Zero maintenance cost.

The architecture has three layers: a user interface accessible through Slack, VS Code, and Copilot CLI; a context layer built on a bronze, silver, and gold data curation model with federated contributions from product and data teams; and a query engine that abstracts away which system to hit. Every context change ships only after passing an offline evaluation framework that measures accuracy, latency, and regressions against curated test cases with ground-truth SQL. The eval pipeline runs via `gh agent-task create`, parallelizes trials, and aggregates per-case completion rate and duration.

The full post details how the context agent normalizes federated markdown contributions across repositories, how the GitHub MCP Server loads context at runtime, and how the system evolved through iteration. If you build internal tooling on top of LLMs and care about keeping context accurate at scale, the evaluation framework design alone is worth reading.

[READ ORIGINAL →]