mateusz@systems ~/book/ch05/load-balancing $ cat section.md

Load Balancing

Load balancers distribute traffic across multiple servers, enabling horizontal scaling, high availability, and graceful degradation. Understanding Layer 4 vs Layer 7 load balancing, health check mechanisms, and session affinity is critical for building resilient distributed systems.

# Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the network stack, trading simplicity for functionality.

Layer 4 (Transport Layer) Load Balancing

L4 load balancers make routing decisions based on IP address and TCP/UDP port. They don't inspect application-layer data (HTTP headers, cookies, etc.).

Client Request:
    TCP SYN to 203.0.113.10:443

L4 Load Balancer:
    - Sees: source IP, dest IP, dest port (443)
    - Picks backend: 10.0.1.5
    - Forwards TCP connection to 10.0.1.5:443
    - All packets in this TCP connection go to same backend

Simple forwarding, no protocol knowledge needed

Characteristics:

  • Fast: minimal processing, low latency (~1ms added)
  • Protocol-agnostic: works with any TCP/UDP application (HTTP, gRPC, databases, SSH)
  • High throughput: can handle millions of connections per second
  • Simple health checks: TCP handshake or UDP ping

Limitations:

  • Can't route based on URL path, HTTP headers, or cookies
  • Can't terminate SSL (passes encrypted traffic through)
  • Limited visibility into application-layer failures

Layer 7 (Application Layer) Load Balancing

L7 load balancers understand application protocols (HTTP, gRPC) and make routing decisions based on request content.

Client Request:
    GET /api/users HTTP/1.1
    Host: api.example.com
    Cookie: session_id=abc123

L7 Load Balancer:
    - Terminates TCP/TLS connection
    - Parses HTTP request
    - Sees: path=/api/users, cookie=abc123
    - Routes to backend based on path or cookie
    - Opens new connection to backend
    - Forwards HTTP request

Full protocol awareness, can modify requests/responses

Characteristics:

  • Content-based routing: send /api/* to API servers, /static/* to CDN
  • SSL termination: decrypt once at LB, send plain HTTP to backends
  • Advanced health checks: HTTP 200 response, JSON validation
  • Request modification: add headers (X-Forwarded-For), rewrite URLs
  • Session affinity via cookies

Limitations:

  • Slower: ~5-10ms added latency (protocol parsing, SSL termination)
  • Lower throughput: tens to hundreds of thousands requests/sec
  • More CPU intensive (TLS encryption/decryption)

Comparison Table

+--------------------+------------------+--------------------+
| Characteristic     | Layer 4          | Layer 7            |
+--------------------+------------------+--------------------+
| Routing Decision   | IP + Port        | URL, headers,      |
|                    |                  | cookies            |
+--------------------+------------------+--------------------+
| Latency            | ~1ms             | ~5-10ms            |
+--------------------+------------------+--------------------+
| Throughput         | Millions conn/s  | 100Ks req/s        |
+--------------------+------------------+--------------------+
| SSL Termination    | No (passthrough) | Yes                |
+--------------------+------------------+--------------------+
| Health Checks      | TCP/UDP ping     | HTTP status,       |
|                    |                  | content validation |
+--------------------+------------------+--------------------+
| Use Cases          | TCP services,    | HTTP APIs, web     |
|                    | databases, gRPC  | apps, microservices|
+--------------------+------------------+--------------------+
| Examples           | AWS NLB, HAProxy | AWS ALB, Nginx,    |
|                    | (L4 mode)        | Envoy, HAProxy (L7)|
+--------------------+------------------+--------------------+

# Health Checks and Failover

Health checks determine which backends are available. Failed backends are removed from the pool automatically.

Health Check Types

TCP Health Check (L4):

Load balancer attempts TCP handshake with backend
    SYN --> [Backend]
    <-- SYN-ACK
    ACK -->
Success: backend is "healthy"
Timeout: backend is "unhealthy", removed from pool

HTTP Health Check (L7):

Load balancer sends HTTP request to backend
    GET /health HTTP/1.1
    Host: backend.internal

Backend responds:
    HTTP/1.1 200 OK
    {"status": "healthy", "db": "connected"}

Success: HTTP 200 response received
Failure: HTTP 500, timeout, or connection refused

Advanced health checks: Validate response content (e.g., check for specific JSON field), test database connectivity, verify disk space. More accurate than simple TCP checks.

Failover Behavior

When a backend fails health checks, the load balancer stops sending new traffic to it. Existing connections may be handled differently:

  • Immediate removal: New connections only. Existing connections remain until closed.
  • Connection draining: Wait for existing connections to finish (timeout), then remove backend. Graceful.
  • Forced termination: Close all connections immediately. Fast but disruptive.

Timing parameters: Health check interval (10s typical), failure threshold (3 consecutive failures = unhealthy), recovery threshold (2 consecutive successes = healthy). Tuning these controls failover speed vs false positive rate.

# Session Affinity (Sticky Sessions)

Session affinity ensures requests from the same client go to the same backend server. Necessary for stateful applications that store session data in local memory.

Cookie-Based Affinity (L7)

First Request:
    Client --> LB --> Backend A
    LB adds cookie: Set-Cookie: LB_SESSION=server_a

Subsequent Requests:
    Client --> LB (sees cookie: LB_SESSION=server_a) --> Backend A
    All requests with this cookie go to Backend A

Advantage: Works across LB restarts (cookie persists in client browser).

Disadvantage: If Backend A fails, session is lost (unless session data is replicated).

Source IP Affinity (L4)

Client IP: 198.51.100.5
    LB hashes IP: hash(198.51.100.5) % num_backends = 2
    Always routes to Backend 2

Advantage: Simple, works at L4, no cookies needed.

Disadvantage: Clients behind NAT (same source IP) all go to same backend. Uneven load distribution.

# Load Balancing Algorithms

How does the load balancer choose which backend to use? Several algorithms exist:

  • Round-robin: Cycle through backends in order. Simple, even distribution (if all requests are similar).
  • Least connections: Send to backend with fewest active connections. Good for long-lived connections.
  • Weighted round-robin: Assign weights (server1=2, server2=1). Server1 gets twice the traffic. Useful when backends have different capacity.
  • Random: Pick random backend. Simple, surprisingly effective for large pools.
  • Latency-based: Route to backend with lowest response time. Requires health checks to measure latency.

# Connection Draining

When removing a backend (maintenance, scale-down, health check failure), connection draining allows in-flight requests to complete gracefully.

Time  Event                           Action
----  -----------------------------   ------------------------------
T+0   Admin marks backend for removal LB stops sending NEW requests
T+1   Existing connections continue   In-flight requests complete
T+30  Drain timeout reached           LB forcibly closes remaining
                                      connections
T+30  Backend removed from pool       Safe to shut down backend

Timeout tuning: Set drain timeout longer than longest expected request duration. For web APIs: 30-60s typical. For long-polling/streaming: 300s or more.

# Key Takeaways

  • L4 load balancers are fast, protocol-agnostic, but limited routing capabilities
  • L7 load balancers offer content-based routing, SSL termination, but higher latency
  • Health checks determine backend availability; tune interval/thresholds for failover speed vs stability
  • Session affinity (sticky sessions) routes same client to same backend, needed for stateful apps
  • Connection draining enables graceful backend removal without dropping in-flight requests
  • Choose algorithm based on workload: round-robin for stateless, least-connections for long-lived