GitHub logged four outages in March 2026. The worst hit on March 3, driving a 40% failure rate on github.com requests, 43% on the API, and 21% on Copilot. Root cause: a deployment bug expired every user's settings cache simultaneously, triggering a replication cascade. The fix was a rollback. GitHub is now isolating the cache to a dedicated host and adding a killswitch, because the same cache mechanism already caused an incident in February.

The other three incidents reveal a pattern of infrastructure change risk. On March 5, a Redis load balancer misconfiguration during a resiliency upgrade broke GitHub Actions for nearly three hours, with 95% of workflow runs delayed and 10% failing outright. On March 19 and 20, the Copilot Coding Agent hit back-to-back authentication failures against its own datastore, reaching a 99% error rate on the second occurrence because the first remediation was incomplete. On March 24, a Microsoft Teams integration dependency failed externally, hitting a 90.1% peak error rate with no GitHub-side fix available.

The full report is worth reading for the specific failure chain mechanics, especially how a single cache expiry propagated across github.com, the API, Actions, and Copilot simultaneously. GitHub acknowledges the February-to-March repeat is a signal of incomplete fixes, and the architectural work it references is not yet done.

[READ ORIGINAL →]