Linux & DevOps

Critical Linux CUBIC Bug Cripples QUIC Connections: Cloudflare Engineers Reveal One-Line Fix

2026-05-19 13:04:56

Breaking: Critical Bug in Linux CUBIC Congestion Control Affects QUIC Performance

Cloudflare engineers have uncovered a severe bug in the Linux CUBIC congestion control algorithm that, when ported to their QUIC implementation quiche, permanently pinned the congestion window (cwnd) at its minimum after a congestion collapse, preventing recovery. Tests revealed a staggering 61% failure rate in specific scenarios involving heavy early loss.

Critical Linux CUBIC Bug Cripples QUIC Connections: Cloudflare Engineers Reveal One-Line Fix
Source: blog.cloudflare.com

"This bug essentially rendered the congestion controller useless in post-collapse recovery, which is exactly the scenario it's designed to handle," said a Cloudflare engineer involved in the investigation. The fix was a near-one-line change that broke the cycle.

Background

CUBIC, standardized in RFC 9438, is the default congestion controller in the Linux kernel, governing how most TCP and QUIC connections on the public internet probe for available bandwidth. Cloudflare's open-source QUIC implementation, quiche, uses CUBIC as its default, meaning this code is in the critical path for a significant share of the traffic they serve.

The bug originated from a Linux kernel change aimed at aligning CUBIC with an "app-limited exclusion" described in RFC 9438 §4.2-12. This fix for a real TCP issue, when ported to quiche, surfaced unexpected behaviors that caused the cwnd to become permanently stuck.

The Symptom: Test Failures at 61%

The investigation began after reports of erratic failures in Cloudflare's ingress proxy integration test pipeline. The failures occurred in scenarios where CUBIC was evaluated under heavy loss in the early part of the connection, a regime known as congestion collapse.

"Recovery after congestion collapse is an uncommon regime, but it is exactly the regime a congestion controller exists to handle," the engineers noted. Most tests focus on steady-state growth; this corner case revealed a critical flaw that had been invisible in throughput dashboards.

Critical Linux CUBIC Bug Cripples QUIC Connections: Cloudflare Engineers Reveal One-Line Fix
Source: blog.cloudflare.com

What This Means

For Cloudflare and any users of quiche or similar QUIC implementations using CUBIC, this bug could lead to persistently poor performance after network congestion events. The cwnd being pinned at minimum means the sender is permanently throttled, unable to utilize available bandwidth—effectively breaking the algorithm's ability to recover.

The fix, though simple, highlights the complexity of porting kernel code to user-space implementations. The one-line change ensures that after a congestion collapse, the window can grow again, restoring normal operation. Cloudflare has since rolled out the fix to production, and the bug has been reported to the Linux kernel community for upstream consideration.

The Fix: Elegant and Minimal

The solution was an elegant, near-one-line patch that broke the cycle causing the cwnd to remain stuck. "It's one of those fixes that makes you wonder why it wasn't obvious before," said a Cloudflare engineer. The patch has been shared with the open-source community.

This incident underscores the importance of testing congestion control algorithms under extreme edge cases, especially as QUIC adoption grows. Cloudflare encourages other implementers to review their CUBIC ports for similar issues.

Explore

Gabi the Android Monk: 6 Groundbreaking Facts About AI in Religion Why $37 Billion in AI Spending Is Failing: Culture, Not Technology, Is the Barrier California Social Media Ban Sparks Free Speech Showdown: EFF Warns of Censorship Precedent Mastering GitHub Copilot CLI: Interactive vs Non-Interactive Mode How Wind and Solar Saved the UK Billions in Gas Imports Since the Iran Conflict