Cloudflare unveils new dashboard for tracking TCP connection anomalies

Cloudflare launches a real-time dashboard on Radar to monitor TCP resets and timeouts, offering insights into network behaviors.

Cloudflare unveils new dashboard for tracking TCP connection anomalies
TCP connection

Cloudflare this week announced the launch of a new dashboard and API endpoint on its Radar platform, designed to provide near real-time visibility into anomalous TCP connections across its global network. This development comes as part of Cloudflare's ongoing efforts to enhance transparency and accountability in internet infrastructure.

The new feature focuses on TCP connections that terminate within the first 10 ingress packets due to resets or timeouts. According to Cloudflare, these anomalous connections account for approximately 20% of new TCP connections to their servers globally. The company's ability to generate and share this data follows from an extensive global investigation into connection tampering.

Cloudflare's network handles over 60 million HTTP requests per second worldwide, with about 70% of these received over TCP connections. The remaining 30% utilize QUIC/UDP protocols. This massive scale provides Cloudflare with a unique vantage point to observe and analyze network behaviors across the internet.

The new dashboard categorizes anomalous TCP connections into four stages:

  1. Post-SYN (mid-handshake): Connections that reset or timeout after the server receives a client's SYN packet but before the client acknowledges the server's response.
  2. Post-ACK (immediately post-handshake): Connections that terminate after the handshake completes but before any data is transmitted.
  3. Post-PSH (after first data packet): Connections that close after the server receives a packet with the PSH flag set, indicating the first data transmission.
  4. Later (after multiple data packets): Connections that reset within the first 10 packets from the client, but after multiple data exchanges.

Cloudflare's methodology for detecting and analyzing these anomalous connections involves a three-step process. First, they sample a portion of connections arriving at their client-facing servers. This sampling is completely passive, ensuring no decryption of traffic occurs. Second, they reconstruct connections from the captured packets, focusing only on the client-to-server direction. Finally, they match reconstructed connections against a set of signatures for anomalous behaviors.

The company emphasizes that this approach can be replicated by other network operators or researchers, as it doesn't require access to the destination server or decryption of encrypted traffic. This transparency allows for independent verification and further study of the observed phenomena.

Cloudflare's analysis reveals several potential causes for these anomalous connections, including:

  • Internet scanners probing server responses
  • Sudden application shutdowns
  • Network errors and unstable connections
  • Malicious attacks, such as SYN flood attempts
  • Connection tampering by middleboxes or firewalls

The dashboard also provides insights into specific network behaviors across different countries and autonomous systems (ASes). For example, Cloudflare observed higher rates of Post-ACK and Post-PSH anomalies in connections originating from Mexico and Peru, which could be indicative of zero-rating practices by mobile network operators.

In addition to the connection stage classification, Cloudflare has developed a tagging system to describe more specific connection behaviors. These tags consider factors such as packet inter-arrival timing, TCP flag combinations, and other packet fields. While not currently visible in the public dashboard, Cloudflare plans to expose these tags in future updates to provide even more granular insights.

The company sees multiple use cases for this dataset, including:

  1. Confirming previously-known network behaviors
  2. Exploring new targets for follow-up studies
  3. Conducting longitudinal studies to capture changes in network behavior over time

Cloudflare emphasizes that this passive measurement approach works at their global scale but does not identify root causes on its own. They encourage researchers and network operators to corroborate their observations with other data sources, such as active measurements or on-the-ground reports.

Looking ahead, Cloudflare plans several improvements to the TCP resets and timeouts dataset:

  • Expanding the set of tags for capturing specific network behaviors
  • Extending insights to connections from Cloudflare to customer origin servers
  • Adding support for QUIC, which currently accounts for over 30% of HTTP requests to Cloudflare worldwide

This initiative aligns with Cloudflare's stated mission to help build a better internet through increased transparency and accountability. By sharing these insights publicly, the company aims to contribute to the broader understanding of internet infrastructure and anomalous network behaviors worldwide.

Key facts

Cloudflare launched a new dashboard on September 5, 2024, to track TCP connection anomalies.

The dashboard provides near real-time data on connections that terminate within the first 10 ingress packets.

Approximately 20% of new TCP connections to Cloudflare's servers globally are classified as anomalous.

Cloudflare handles over 60 million HTTP requests per second, with 70% over TCP and 30% over QUIC/UDP.

The analysis categorizes anomalous connections into four stages: Post-SYN, Post-ACK, Post-PSH, and Later.

Potential causes for anomalies include scanners, application shutdowns, network errors, attacks, and connection tampering.

Cloudflare's methodology involves sampling connections, reconstructing them from packets, and matching against signatures.

The company plans to expand the dataset with more detailed tags and support for QUIC connections in the future.