A significant portion of Cloudflare customers experienced a service disruption resulting in widespread HTTP 5xx errors. The company confirmed that the outage was not the result of an attack, but was instead caused by a change it deployed to its Web Application Firewall (WAF) service.
The disruption began at approximately 13:40 UTC and was resolved by 16:00 UTC, lasting just over two hours. The root cause was an emergency WAF rule change intended to mitigate a recently disclosed vulnerability.
The React2Shell Vulnerability Mitigation
The WAF rule in question was deployed to protect Cloudflare customers from potential exploits of the React2Shell vulnerability. This critical vulnerability is a remote code execution (RCE) flaw found within the open-source ReactPHP library, specifically affecting applications that use its built-in HTTP server component.
In response to the public disclosure of React2Shell, Cloudflare’s security team developed and deployed a new rule to its WAF to block exploitation attempts against its customers. This is a standard procedure for mitigating emerging threats across its network.
CPU Exhaustion and Service Restoration
The newly deployed WAF rule contained a flaw that led to a substantial increase in CPU utilization across Cloudflare’s global network. This excessive CPU consumption, referred to as CPU exhaustion, degraded the performance of the servers responsible for handling WAF-protected traffic.
As a result, a large volume of legitimate user requests timed out, manifesting as HTTP 5xx server errors for end-users. Cloudflare’s engineering teams identified the problematic WAF rule as the source of the issue. A rollback of the change was initiated at 15:00 UTC. The process to globally revert the rule took approximately one hour, with services fully restored by 16:00 UTC. Cloudflare issued an apology for the service interruption caused by this internal change.
Source: https://www.securityweek.com/cloudflare-outage-caused-by-react2shell-mitigations/