Concise Cyber

Subscribe below for free to get these delivered straight to your inbox

Advertisements
The Bug That Broke the Internet: A Factual Look at the June 2021 Fastly Outage
Advertisements

On June 8, 2021, a significant portion of the internet abruptly became inaccessible. High-traffic websites, including Amazon.com, Reddit, Twitch, the New York Times, and the United Kingdom’s primary government site, went offline, displaying error messages to users globally. The widespread disruption was traced back to a single company: Fastly Inc., a critical content delivery network (CDN) that helps speed up content loading for major websites.

The outage lasted for approximately one hour, but its root cause was a software bug that had been introduced nearly a month earlier. This event highlighted the interconnected nature of internet infrastructure, where a single flaw in a widely used service can have cascading effects across the web.

Anatomy of the Outage

The problem originated with a software update Fastly deployed on May 12, 2021. This update contained a latent bug that did not immediately cause issues. The bug remained dormant within Fastly’s systems for weeks until a specific and unanticipated sequence of events occurred. The trigger for the global outage was a single, unnamed Fastly customer who made a valid change to their service configuration. This specific change exposed the hidden bug, causing an immediate and catastrophic failure.

Once triggered, the bug caused 85% of Fastly’s network to return errors instead of serving website content. The flaw was an “edge case,” a problem that only manifests under a very specific and rare set of circumstances, which is why it was not caught during testing or its initial deployment.

Detection and Resolution

Fastly’s internal monitoring systems detected the widespread service disruption within one minute of its onset. Engineers quickly identified that the issue was caused by the May 12 software deployment after seeing the impact of the customer’s configuration change. The resolution involved a swift rollback of the problematic software configuration to a previous, stable version. It took 49 minutes for most of Fastly’s network to recover and for the affected websites to come back online.

In a blog post following the incident, Fastly’s then-senior vice president of engineering, Nick Rockwell, stated, “This outage was broad and severe, and we’re truly sorry for the impact to our customers and everyone who relies on them.” The company provided a detailed, public account of the bug, the trigger, and the steps taken to resolve the issue.

Source: https://biztoc.com/x/b73515925b891f2f