Load Balancing in the context of websites and webservices is a way to distribute incoming traffic across multiple servers. There are multiple techniques do this which I will explain in this post. But first let me explain the relationship between load balancing, load sharing and high availability.
Load sharing statically distributes traffic across servers according to a fixed ratio. Neither server availability, performance or any other metric are taken into account to decide where to send traffic. This is the simplest form to handle more traffic.
Load balancing distributes traffic across multiple servers taking into account availability, performance (usually response time) and any other metric your load balancing software supports.
High availability can be a consequence of applying load balancing by increasing fault tolerance. When you have two servers, and distribute traffic evenly across them you increase availability. Even when one server fails your service will still be available, though with degraded performance.
Now let’s start with the techniques.
DNS Round Robin exploits the fact that clients do not directly connect to IP addresses, but first do a DNS lookup to retrieve the IP address. By answering each client’s DNS request with a different IP address we can distribute load over several servers.
Technically hostnames are mapped to IP addresses in the DNS server using A (IPv4) and AAAA (IPv6) records. This works equivalently for both IPv4 and IPv6, so I’ll focus on IPv4 here.
To configure the DNS server to list multiple IP addresses in its response, simply add more A records with different IP addresses. The order in which these records are returned is randomly permuted for each request and since most clients will always pick the first entry in the response, this will distribute traffic across all servers.
All kinds of DNS servers and most commercial DNS services should allow you to configure DNS Round Robin.
The biggest advantage of DNS Round Robin is that it’s very simple to set up: Just add more A or AAAA records to your DNS server. You don’t need any infrastructure for this other than a DNS server.
DNS Round Robin is a load sharing technique, it does not take into account the load, response times or any other metric of your servers. It will also direct traffic to servers that are down. It depends on your client implementation if connection attempts to other servers will be made. Or actually, not even every client will re-resolve the hostname to an IP address and use that for new connections. Though the most common http clients - web browsers and some standard linux tools - are more resilient as described by Web Resilience with Round Robin DNS.
There’s a delay of a few to tens of minutes before any change to the DNS records will take effect on all clients due to DNS caching. You have some control over this via the ttl (time to live) field of the A and AAAA records.
DNS responses are stateless and do not support sticky sessions. When the cached IP address of a server is invalidated the client will be directed to another random server.
Also, when clients are not distributed evenly behind caching intermediate DNS servers you will see uneven load.
A HTTP load balancer is an intermediate piece of software or hardware that accepts all traffic. Requests are forwarded to backend servers and responses are forwarded to the original clients. The algorithm used to pick the actual server for one request determines the effectiveness and properties of this method. Often this algorithm also tracks the health status of each backend server. Whenever a backend server is (or seems to be) unhealthy, no further traffic will be forwarded to this specific server.
The most simple algorithm is round robin: Direct each new connection to the least recently used server.
Another option is to track the number of connections to each backend server. Then each new connection is directed to the server with the least amount of open connections. Instead of only tracking the number of connections to each server, some other metric like response times can also be used for this decision. Then each new connection will be forwarded to the server, that the load balancer believes to be the best performing server in the near future.
If sessions should be sticky these two algorithms do not work well: Each connection - even from the same client - would be directed to a possibly different backend servers. Instead it’s possible to calculate a hash value from the client’s IP address to determine the target backend server. This way the same client (identified by its IP address) always connects to the same backend server.
HTTP load balancing can be implemented using tools that support generic TCP load balancing, HTTP load balancing, HTTP reverse proxies and HTTP caching proxies:
- your favorite cloud vendor also provides load balancing as a service (AWS Elastic Load Balancing, Google Load Balancing)
HTTP load balancing can evenly and fairly distribute load across several backend servers. It also handles backend server failures well and can react very fast to changes of their availablity.
HTTP load balancers often support additional features like compressing responses from your servers, terminating SSL / TLS sessions (HTTPs) and only forwarding plain HTTP to your servers. They can also be used to cache responses - so traffic won’t even hit your backend servers.
Introducing another layer into your network increases not only complexity, but also network latency and management overhead. It might even become the bottleneck between your clients and servers.
If you do not setup your load balancer in a highly available fashion, the load balancer itself will be a single point of failure - negating the apparent increased availability of the backend servers by load balancing over them.
Instead of passing all traffic through HTTP load balancers, requests can also be redirected to other servers by using the 3xx familiy of HTTP status codes. The 3xx status codes tell the client to perform another request with a new URL. This way the second request and reply are transmitted directly between client and server. Status code 307 is best suited for this task.
Unrelated to load balancing, this technique is also used to implement URL shorteners or can be used to hand out URLs with temporary credentials to clients, e.g. for authenticated downloads from services like AWS S3.
Traffic flows directly between clients and servers after the initial redirect, which is especially useful for large downloads or streams.
Redirecting requests can noticeably increase latency. The redirect incurs the round trip necessary to set up a completely new connection to the actual backend server. This might be too slow for small website content like css and html files.
You have to implement and maintain your own tracking and redirection server. Whether your redirection servers implement load sharing or load balancing is up to you to decide and implement. Also, like with HTTP load balancing, the redirection servers have to be highly available so as not to create a single point of failure.