GitHub monolith traffic on Azure jumped from 8% to 40% between February and May 2026. Git traffic hit 30% on Azure, repository replication reached 99%, and effective capacity more than doubled in four months. The new users service is fully cut over and handles double the traffic at lower database cost. Stateless authentication tokens are rolling out, eliminating per-request database lookups that amplified past outages. The governing principle is explicit: availability, then capacity, then features.

May brought nine incidents. On May 4, a routine schema migration against a large production table saturated database connections during peak traffic, causing 1.3% of requests to return 5xx errors at peak, averaging 0.46% across 66 minutes. Pull requests went red. Codespaces, Copilot, Pages, and OAuth all degraded through shared data dependencies. Detection took three minutes, mitigation 33 minutes. On May 5 and 6, two linked GitHub Actions incidents hit East US hosted runners: 13.5% of standard runner jobs failed on May 5, and a configuration change made during that remediation blocked allocations the next morning, failing 17.1% of standard runner jobs. Around 8,500 Copilot code review requests timed out across the two days.

The remediation plans are worth reading in full. GitHub is adding automated circuit breakers to pause in-flight migrations when database latency or connection utilization crosses safe thresholds, dynamic throttling tied to live cluster load, and monitoring that fires before customer impact starts. For Actions, they are fixing throttle response logic that failed to trigger existing backoff, hardening allocation filters against abnormal data shapes, and auditing rate limits end-to-end. These are not postmortem boilerplate. They are structural fixes to the exact mechanisms that failed, and tracking whether GitHub ships them is the actual story.

[READ ORIGINAL →]