mateusz@systems ~/book/ch05 $ cat chapter.md

Chapter 5

Datacenter & Network Fundamentals

Modern infrastructure runs in datacenters—whether on-premises or in the cloud. Understanding datacenter architecture isn't just for hardware engineers; it's essential for anyone building production systems. The physical topology, network design, and protocol choices fundamentally shape how your systems behave under load, during failures, and at scale.

This chapter bridges the gap between abstract networking concepts and real-world datacenter operations. We'll explore how racks become pods, how traffic flows north-south and east-west, why leaf-spine architecture dominates modern datacenters, and how cloud providers like AWS map these physical realities into their service models.

We'll cover the practical knowledge you need to understand datacenter operations:

  • Physical Infrastructure: Racks, power distribution, cooling, and how they create fault domains
  • Network Architecture: Leaf-spine vs 3-tier, ToR switches, and traffic patterns
  • Protocols in Practice: BGP, OSPF, VLANs, and when each matters for troubleshooting
  • HPC Networking: RDMA, InfiniBand, and low-latency network fabrics
  • Cloud Mapping: How AWS regions, AZs, and services map to physical infrastructure

By the end of this chapter, you'll understand how datacenters actually work—and why certain outages happen the way they do.

# Chapter Sections

Physical Infrastructure

Racks, pods, and rows. Power distribution units (PDUs) and A/B power feeds. Cooling systems (hot aisle/cold aisle). Top of Rack (ToR) vs End of Row (EoR) switches. How physical layout creates fault domains.

Network Topology

Traditional 3-tier (access/aggregation/core) vs modern leaf-spine architecture. North-South (client ↔ datacenter) vs East-West (server ↔ server) traffic patterns. Oversubscription ratios and bandwidth planning. Layer 2 vs Layer 3 switching boundaries.

Network Protocols

BGP for inter-datacenter and internet routing. OSPF for internal datacenter routing. VLANs and network segmentation. MTU and jumbo frames for storage/HPC workloads. LAN vs WAN characteristics. Real-world outage example: Facebook BGP incident.

Tunneling & Security

IPSec and VPN fundamentals. Overlay networks (VXLAN, Geneve). When encryption impacts performance. Tunneling protocols in cloud environments.

DNS in Datacenters

Internal vs external DNS. Split-horizon DNS configurations. Service discovery patterns (DNS-based vs dedicated systems). TTL considerations and caching behavior. DNS during outages and failover.

HPC Networking

Why RDMA (Remote Direct Memory Access) matters: latency and CPU overhead. InfiniBand vs RoCE (RDMA over Converged Ethernet) vs iWARP comparison. Low-latency requirements for HPC and storage workloads. Network fabric design for performance-critical applications.

Load Balancing

Layer 4 (transport) vs Layer 7 (application) load balancing. Health checks and failover mechanisms. Session affinity and connection draining. Load balancing algorithms and when each works best.

Cloud Provider Architecture

AWS regions, availability zones (AZs), and VPCs mapped to physical infrastructure. How services work: S3 (object storage), EBS (block storage), EFS (file storage). Fault domains and update domains in cloud environments. Cross-cloud comparison: GCP and Azure equivalents.

Hybrid Cloud Connectivity

Connecting on-premises datacenters to cloud providers. VPN over internet vs dedicated connections (Direct Connect, ExpressRoute). Bandwidth, latency, resiliency, and cost trade-offs. Redundancy patterns with BGP failover. Data migration strategies: bulk transfers, continuous sync, and hybrid storage gateways.