Originally published on September 29, 2023 | Updated October 10, 2023
By: Dave Andrews, Marcus Hildum, Sergio Ruiz
Update: HTTP/2 Rapid Reset Attack – CVE-2023-44487
Following the brief initial blog below, Edgio engaged with peers from across the industry on the definition and responsible disclosure of CVE-2023-44487 – HTTP/2 Rapid Reset Attack.
The underlying issue affects many implementations of HTTP/2 servers, which makes the implications of the attack much wider than Edgio previously realized. Edgio advises all customers that run public-facing infrastructure to upgrade to patched versions of their servers as soon as they become available, and/or disable HTTP/2 temporarily.
Edgio is also available to help mitigate the risk to our customers by performing the HTTP/2 termination and proxying HTTP/1.1 back to customers’ infrastructure. Please reach out to us to initiate this process.
On August 28th, 2023 at 6:43 p.m., PST, Edgio engineers observed a rise in memory utilization on our edge servers, request rates to several large web properties, and the volume of logs being generated at the edge.
The traffic, soon identified as an attack, was novel because it was only observable in the logs of our layer 7 load balancer. Edgio runs our custom HTTP caching and proxying engine, Sailfish, as both our layer 7 load balancer (which we call a “frontend”), and our caching and proxying layer (the “backend”). This enables common instrumentation and logging at both layers, making comparisons across them trivial.
When we dug into the frontend logs, we observed some interesting behaviors indicating an attack:
- The request count for single clients was far higher than usual: during the attack we saw instances of over 20,000 requests on a single socket.
- No bytes were being sent to clients.
- The total request time, from start to finish was between 1 and 2 milliseconds, all spent initiating a new proxy connection to the backends.
- All the connections exhibiting the behavior were HTTP/2 connections.
Based on these initial observations we theorized that the attacker was abandoning the requests using HTTP/2’s RST_STREAM frame and starting new requests on the same socket, very quickly.
After this, we split up our efforts into three distinct workstreams:
- Investigating any potential issues impacting the HTTP/2 library we use, nghttp2, that might prove the root cause.
- Building Sailfish variables to expose the fundamentals of this behavior to enable mitigations.
- Building new metrics, dashboards and alerting to identify this type of attack more quickly.
1. Envoy… but really nghttp2
After a small search, we found in this issue Envoy, a service proxy which Edgio doesn’t utilize on the edge, and the corresponding CVE. Upon deeper review of the diff, we realized this issue was not only in Envoy, but actually in nghttp2, which we do use.
A pull request and point tag release for nghttp2 were released shortly after the disclosure, addressing the underlying issue. The lack of a specific CVE allocated against nghttp2 had meant our automated CVE scanning system, which we use to track vulnerabilities in key software we utilize, missed the issue originally.
We immediately started the process to upgrade this dependency and deploy it, which was completed a number of weeks ago.
2. Request reset percent
In parallel, we worked to identify the attack behavior programmatically, within Sailfish itself, in order to be able to take action immediately to prevent performance or reliability issues. We decided to implement a config variable (h2_remote_reset_percent) inside Sailfish, that would track the percentage of requests on a given connection that has been reset by the client.
The addition, in conjunction with an existing variable for the request count on a single connection, allowed us to craft a rule that would immediately close a connection to a client that had exceeded a request threshold and had reset more than a configured percentage of requests. We wrapped this configuration in normal operational fail-safes, which allow us to disable it for specific locations or customers.
In pseudocode this looks like:
if request_count > 1000 and h2_remote_reset_percent > 99 and pop ~ ".*" and customer_id not in () then connection.silent_close(); fi
After careful validation to avoid any unintended impact to our customers’ traffic, the new rule was deployed and Edgio engineers continued to monitor for any further anomalies.
3. Counts and Ratios
In order to more quickly identify when attacks of this sort are occurring, we configured a new dashboard and alert based on the count of HTTP/2 RST_STREAM frames received from clients, across a location. This, coupled with a singular view of memory availability and health-checks, gave us a clear signal of potential degradation due to this specific type of attack:
However, we remained concerned about other potential attack types that might affect only the frontends specifically. To provide visibility into this more general concern, we began tracking the ratio of the transaction rate between frontends and backends in a given location. The underlying data for this comparison has been a core part of our monitoring for a very long time.
Looking at normal behavior, you can see the strong banding around 1, the expected ratio, as each request that arrives at a front end translates into a single backend request. Also visible is banding closer to 0.5 and 0.25, which occur in primarily dormant test locations, where systems like purge and health-checking cause more internal transactions to be processed by backends:
During the initial attack however, you can clearly see the effect on this ratio:
Our current alerting is configured to trigger when the ratio exceeds a certain value, creating an incident for Edgio support engineers to triage and start mitigation steps.
This was an interesting new attack-type, leveraging a relatively recently disclosed vulnerability in a widely utilized library. Fortunately, the team at Edgio worked quickly to improve our operational awareness, mitigate the specific root cause of the attack, as well as put in generic and tunable general-purpose mitigations for this class of attack.
Of course, we are always working on improvements like this, such as new ways of identifying bad actors via fingerprinting, as well as integrating this work into our security product suite to allow more persistent blocking and rate limiting.
Never a dull moment at Edgio.
To learn more about our full-spectrum DDoS Protection, part of Edgio’s award-winning Web Application and API Protection (WAAP) solution, contact our experts here.