• 20 November 2024
load balancing

Last week, Saber Mesgari, the product manager of ArvanCloud’s CDN team, presented a paper titled “Advancements in CDN Load Balancing: The Journey from DNS to Modern Solutions” at the CAPIF 3 conference in Kyrgyzstan. Mesgari’s presentation centered around the challenges associated with DNS and load balancing in content delivery networks. He also shared solutions for handling traffic loads in CDNs, based on ArvanCloud’s experience. Below is a summary of the main points from his presentation:

CDN

Load Balancing: Since the inception of the internet, load balancing has been a fundamental concern for web services. The need to efficiently manage increasing traffic and ensure high service availability has been a constant challenge. These ongoing demands have driven CDN developers to continuously seek improvements in their services.

load balancing

*”The image above illustrates the load balancing management system.

Load Balancing and Content Delivery Networks

Content Delivery Networks (CDNs) have always been an ideal infrastructure for providing load balancing services due to their distributed and extensive structure. Since CDNs have Points of Presence (PoPs) in various geographical regions around the world, they can receive traffic from the nearest geographical location. This feature prevents traffic from being concentrated in one or a few geographical locations.

A CDN can manage the incoming traffic at each PoP through a load balancing service. This management is done in two ways: first, a portion of the traffic that the CDN itself can handle (stored content) is delivered to the user from the same PoP. Second, requests that need to be sent to the main servers of the website/service.

@fateme_n_I

DNS-Based Load Balancing: Limitations and Challenges

Initially, Content Delivery Networks (CDNs) used DNS as a solution to distribute incoming traffic to their PoPs. This was done by returning the IP address of the closest PoP to the user’s geographical location when responding to DNS requests. This way, the user’s HTTP requests were directed to that specific PoP, effectively distributing traffic among different PoPs worldwide.

However, this method had several limitations:

Lack of precise control: Due to the caching of DNS responses and the existence of Time To Live (TTL) values, there was no precise control over the amount of incoming traffic to each PoP. Also, if a PoP needed to be taken offline, it would involve a time-consuming process to update DNS records.

Inaccurate geolocation: Because users often rely on public DNS services like 9.9.9.9 or 1.1.1.1 instead of directly querying authoritative DNS servers, accurately determining a user’s geographical location was challenging. The authoritative server would identify the public DNS service as the requester rather than the actual user’s location. Even with the development of EDNS, this standard hasn’t been universally adopted by network operators. All of these factors led to the need for alternative mechanisms to distribute traffic within a CDN.

Using BGP Anycast

With this method, all points within a content delivery network share a single, fixed IP range through BGP advertisements. When a user sends a request to this IP, it is automatically received by the nearest CDN point. This approach solves several problems simultaneously:

  • Quick and efficient service removal: If traffic to a specific PoP needs to be stopped, the BGP advertisement can simply be removed. Unlike DNS, this method doesn’t suffer from delays or TTL-related issues.
  • Distributed DDoS protection: Since all CDN PoPs share the same IP address, the risk of a single PoP being targeted by a DDoS attack is mitigated. Incoming traffic is distributed across all available PoPs.
  • Geographic independence: Load balancing is handled at the network layer, eliminating the need to rely on the geographical location of the requesting IP (which can be inaccurate). Traffic is directed to the nearest PoP based on network routing.

Key Challenges in Load Balancing Within CDN PoPs

Once incoming traffic has been distributed across various CDN PoPs, it’s important to split it further between the servers in each PoP. This is because each PoP can consist of dozens, hundreds, or even thousands of individual servers. It’s crucial to distribute incoming traffic among these servers in a way that maximizes the utilization of this infrastructure.

Solutions like Equal-Cost Multi-Path (ECMP) are commonly used to distribute traffic within a data center. However, these methods have many limitations, including the inability to precisely adjust the weight and control the traffic of each path, scalability issues when dealing with a large number of servers, and the lack of real-time control over traffic on each path. Consequently, there is a need for more precise traffic control mechanisms within each PoP.

DSR for Load Balancing

Direct Server Response (DSR) is a method where the load balancer solely handles incoming requests, and responses are sent directly to the user without passing through the load balancer. This bypasses the load balancer for responses, reducing its load and resource consumption.

One tool that utilizes DSR for load balancing is Katran, developed by Meta. By employing XDP for rapid packet processing in Linux kernels and DSR to eliminate the processing of responses sent to the user, Katran can handle millions of requests per second, offering high scalability for handling heavy traffic at the network edge. However, to enable both intra- and inter-datacenter usage, several modifications were necessary, including:

  • Improved health check and monitoring services
  • EBPF firewall and IPv6 support
  • Centralized monitoring and the addition of maintenance mode
  • Inter-datacenter load balancing

At ArvanCloud, we have successfully utilized BGP Anycast in conjunction with a modified version of Katran, which we refer to as a DSR-based load balancer, to manage incoming traffic on edge servers and deliver high availability for our services.

Leave a Reply

Your email address will not be published. Required fields are marked *