Chapter 5
Datacenter & Network Fundamentals
Modern infrastructure runs in datacenters—whether on-premises or in the cloud. Understanding datacenter architecture isn't just for hardware engineers; it's essential for anyone building production systems. The physical topology, network design, and protocol choices fundamentally shape how your systems behave under load, during failures, and at scale.
This chapter bridges the gap between abstract networking concepts and real-world datacenter operations. We'll explore how racks become pods, how traffic flows north-south and east-west, why leaf-spine architecture dominates modern datacenters, and how cloud providers like AWS map these physical realities into their service models.
We'll cover the practical knowledge you need to understand datacenter operations:
- Physical Infrastructure: Racks, power distribution, cooling, and how they create fault domains
- Network Architecture: Leaf-spine vs 3-tier, ToR switches, and traffic patterns
- Protocols in Practice: BGP, OSPF, VLANs, and when each matters for troubleshooting
- HPC Networking: RDMA, InfiniBand, and low-latency network fabrics
- Cloud Mapping: How AWS regions, AZs, and services map to physical infrastructure
By the end of this chapter, you'll understand how datacenters actually work—and why certain outages happen the way they do.
# Chapter Sections
Racks, pods, and rows. Power distribution units (PDUs) and A/B power feeds. Cooling systems (hot aisle/cold aisle). Top of Rack (ToR) vs End of Row (EoR) switches. How physical layout creates fault domains.
Network TopologyTraditional 3-tier (access/aggregation/core) vs modern leaf-spine architecture. North-South (client ↔ datacenter) vs East-West (server ↔ server) traffic patterns. Oversubscription ratios and bandwidth planning. Layer 2 vs Layer 3 switching boundaries.
Network ProtocolsBGP for inter-datacenter and internet routing. OSPF for internal datacenter routing. VLANs and network segmentation. MTU and jumbo frames for storage/HPC workloads. LAN vs WAN characteristics. Real-world outage example: Facebook BGP incident.
Tunneling & SecurityIPSec and VPN fundamentals. Overlay networks (VXLAN, Geneve). When encryption impacts performance. Tunneling protocols in cloud environments.
DNS in DatacentersInternal vs external DNS. Split-horizon DNS configurations. Service discovery patterns (DNS-based vs dedicated systems). TTL considerations and caching behavior. DNS during outages and failover.
HPC NetworkingWhy RDMA (Remote Direct Memory Access) matters: latency and CPU overhead. InfiniBand vs RoCE (RDMA over Converged Ethernet) vs iWARP comparison. Low-latency requirements for HPC and storage workloads. Network fabric design for performance-critical applications.
Load BalancingLayer 4 (transport) vs Layer 7 (application) load balancing. Health checks and failover mechanisms. Session affinity and connection draining. Load balancing algorithms and when each works best.
Cloud Provider ArchitectureAWS regions, availability zones (AZs), and VPCs mapped to physical infrastructure. How services work: S3 (object storage), EBS (block storage), EFS (file storage). Fault domains and update domains in cloud environments. Cross-cloud comparison: GCP and Azure equivalents.
Hybrid Cloud ConnectivityConnecting on-premises datacenters to cloud providers. VPN over internet vs dedicated connections (Direct Connect, ExpressRoute). Bandwidth, latency, resiliency, and cost trade-offs. Redundancy patterns with BGP failover. Data migration strategies: bulk transfers, continuous sync, and hybrid storage gateways.