Load Balancing
Load balancers distribute traffic across multiple servers, enabling horizontal scaling, high availability, and graceful degradation. Understanding Layer 4 vs Layer 7 load balancing, health check mechanisms, and session affinity is critical for building resilient distributed systems.
# Layer 4 vs Layer 7 Load Balancing
Load balancers operate at different layers of the network stack, trading simplicity for functionality.
Layer 4 (Transport Layer) Load Balancing
L4 load balancers make routing decisions based on IP address and TCP/UDP port. They don't inspect application-layer data (HTTP headers, cookies, etc.).
Client Request:
TCP SYN to 203.0.113.10:443
L4 Load Balancer:
- Sees: source IP, dest IP, dest port (443)
- Picks backend: 10.0.1.5
- Forwards TCP connection to 10.0.1.5:443
- All packets in this TCP connection go to same backend
Simple forwarding, no protocol knowledge needed
Characteristics:
- Fast: minimal processing, low latency (~1ms added)
- Protocol-agnostic: works with any TCP/UDP application (HTTP, gRPC, databases, SSH)
- High throughput: can handle millions of connections per second
- Simple health checks: TCP handshake or UDP ping
Limitations:
- Can't route based on URL path, HTTP headers, or cookies
- Can't terminate SSL (passes encrypted traffic through)
- Limited visibility into application-layer failures
Layer 7 (Application Layer) Load Balancing
L7 load balancers understand application protocols (HTTP, gRPC) and make routing decisions based on request content.
Client Request:
GET /api/users HTTP/1.1
Host: api.example.com
Cookie: session_id=abc123
L7 Load Balancer:
- Terminates TCP/TLS connection
- Parses HTTP request
- Sees: path=/api/users, cookie=abc123
- Routes to backend based on path or cookie
- Opens new connection to backend
- Forwards HTTP request
Full protocol awareness, can modify requests/responses
Characteristics:
- Content-based routing: send /api/* to API servers, /static/* to CDN
- SSL termination: decrypt once at LB, send plain HTTP to backends
- Advanced health checks: HTTP 200 response, JSON validation
- Request modification: add headers (X-Forwarded-For), rewrite URLs
- Session affinity via cookies
Limitations:
- Slower: ~5-10ms added latency (protocol parsing, SSL termination)
- Lower throughput: tens to hundreds of thousands requests/sec
- More CPU intensive (TLS encryption/decryption)
Comparison Table
+--------------------+------------------+--------------------+ | Characteristic | Layer 4 | Layer 7 | +--------------------+------------------+--------------------+ | Routing Decision | IP + Port | URL, headers, | | | | cookies | +--------------------+------------------+--------------------+ | Latency | ~1ms | ~5-10ms | +--------------------+------------------+--------------------+ | Throughput | Millions conn/s | 100Ks req/s | +--------------------+------------------+--------------------+ | SSL Termination | No (passthrough) | Yes | +--------------------+------------------+--------------------+ | Health Checks | TCP/UDP ping | HTTP status, | | | | content validation | +--------------------+------------------+--------------------+ | Use Cases | TCP services, | HTTP APIs, web | | | databases, gRPC | apps, microservices| +--------------------+------------------+--------------------+ | Examples | AWS NLB, HAProxy | AWS ALB, Nginx, | | | (L4 mode) | Envoy, HAProxy (L7)| +--------------------+------------------+--------------------+
# Health Checks and Failover
Health checks determine which backends are available. Failed backends are removed from the pool automatically.
Health Check Types
TCP Health Check (L4):
Load balancer attempts TCP handshake with backend
SYN --> [Backend]
<-- SYN-ACK
ACK -->
Success: backend is "healthy"
Timeout: backend is "unhealthy", removed from pool
HTTP Health Check (L7):
Load balancer sends HTTP request to backend
GET /health HTTP/1.1
Host: backend.internal
Backend responds:
HTTP/1.1 200 OK
{"status": "healthy", "db": "connected"}
Success: HTTP 200 response received
Failure: HTTP 500, timeout, or connection refused
Advanced health checks: Validate response content (e.g., check for specific JSON field), test database connectivity, verify disk space. More accurate than simple TCP checks.
Failover Behavior
When a backend fails health checks, the load balancer stops sending new traffic to it. Existing connections may be handled differently:
- Immediate removal: New connections only. Existing connections remain until closed.
- Connection draining: Wait for existing connections to finish (timeout), then remove backend. Graceful.
- Forced termination: Close all connections immediately. Fast but disruptive.
Timing parameters: Health check interval (10s typical), failure threshold (3 consecutive failures = unhealthy), recovery threshold (2 consecutive successes = healthy). Tuning these controls failover speed vs false positive rate.
# Session Affinity (Sticky Sessions)
Session affinity ensures requests from the same client go to the same backend server. Necessary for stateful applications that store session data in local memory.
Cookie-Based Affinity (L7)
First Request:
Client --> LB --> Backend A
LB adds cookie: Set-Cookie: LB_SESSION=server_a
Subsequent Requests:
Client --> LB (sees cookie: LB_SESSION=server_a) --> Backend A
All requests with this cookie go to Backend A
Advantage: Works across LB restarts (cookie persists in client browser).
Disadvantage: If Backend A fails, session is lost (unless session data is replicated).
Source IP Affinity (L4)
Client IP: 198.51.100.5
LB hashes IP: hash(198.51.100.5) % num_backends = 2
Always routes to Backend 2
Advantage: Simple, works at L4, no cookies needed.
Disadvantage: Clients behind NAT (same source IP) all go to same backend. Uneven load distribution.
# Load Balancing Algorithms
How does the load balancer choose which backend to use? Several algorithms exist:
- Round-robin: Cycle through backends in order. Simple, even distribution (if all requests are similar).
- Least connections: Send to backend with fewest active connections. Good for long-lived connections.
- Weighted round-robin: Assign weights (server1=2, server2=1). Server1 gets twice the traffic. Useful when backends have different capacity.
- Random: Pick random backend. Simple, surprisingly effective for large pools.
- Latency-based: Route to backend with lowest response time. Requires health checks to measure latency.
# Connection Draining
When removing a backend (maintenance, scale-down, health check failure), connection draining allows in-flight requests to complete gracefully.
Time Event Action
---- ----------------------------- ------------------------------
T+0 Admin marks backend for removal LB stops sending NEW requests
T+1 Existing connections continue In-flight requests complete
T+30 Drain timeout reached LB forcibly closes remaining
connections
T+30 Backend removed from pool Safe to shut down backend
Timeout tuning: Set drain timeout longer than longest expected request duration. For web APIs: 30-60s typical. For long-polling/streaming: 300s or more.
# Key Takeaways
- L4 load balancers are fast, protocol-agnostic, but limited routing capabilities
- L7 load balancers offer content-based routing, SSL termination, but higher latency
- Health checks determine backend availability; tune interval/thresholds for failover speed vs stability
- Session affinity (sticky sessions) routes same client to same backend, needed for stateful apps
- Connection draining enables graceful backend removal without dropping in-flight requests
- Choose algorithm based on workload: round-robin for stateless, least-connections for long-lived