Concise Cyber

Subscribe below for free to get these delivered straight to your inbox

Advertisements
The Fastly Outage Explained: How a Software Bug Broke the Internet
Advertisements

On June 8, 2021, a significant portion of the internet went offline. High-profile websites including The New York Times, Reddit, Twitch, and the UK government’s official site suddenly became inaccessible, displaying connection errors to users worldwide. The root cause was not a cyberattack, but a disruptive outage originating from the content delivery network (CDN) provider, Fastly. The incident was triggered by a latent software bug that spiraled into a global service disruption.

The bug responsible for the massive outage was introduced in a software deployment on May 12, 2021. For weeks, it lay dormant within Fastly’s systems, causing no issues until a single, specific event occurred.

The Trigger: A Valid Change Exposes a Latent Bug

The sequence of events began when one of Fastly’s customers pushed a valid configuration change to their services. This legitimate action exposed the previously undiscovered flaw in the May 12 software deployment. The bug resided in Fastly’s Edge Cloud software, and the specific configuration created a condition that caused processes across the company’s network of servers to crash. According to Fastly’s official incident report, this single valid customer change triggered the bug that resulted in 85% of its network returning errors almost instantaneously.

Timeline of the Global Disruption

The outage unfolded and was resolved with remarkable speed. At 09:47 UTC on June 8, the customer’s configuration change was made, triggering the bug. Within one minute, at 09:48 UTC, a global outage began as the failure cascaded through Fastly’s network. Fastly’s engineering and operations teams detected the issue and began working to resolve it. At 10:27 UTC, the team identified the problematic configuration and disabled the feature responsible. By 10:36 UTC, just 49 minutes after the outage began, the vast majority of the network was recovering and services were coming back online for users. Later that day, by 17:25 UTC, a permanent software fix for the bug was deployed across the global network.

Source: https://biztoc.com/x/b73515925b891f2f