mateusz@systems ~/book/ch05/hybrid $ cat section.md

Hybrid Cloud Connectivity

Most organizations don't move everything to the cloud overnight. Hybrid architectures—where workloads run both on-premises and in the cloud—require robust, reliable connectivity between datacenters. The choice of connectivity method fundamentally impacts bandwidth, latency, resiliency, and cost. This section covers common patterns for connecting on-prem infrastructure to cloud providers.

# Connectivity Options

There are three primary approaches to connecting on-premises datacenters to cloud providers, each with different trade-offs.

VPN over Internet

How it works: IPsec VPN tunnels over the public internet create encrypted connections between on-prem routers and cloud VPN gateways.

    On-Premises                    Internet                    Cloud Provider

    +------------+              +----------+              +---------------+
    | On-Prem    |  IPsec VPN   |          |  IPsec VPN   | VPN Gateway   |
    | Router     |<------------>| Internet |<------------>| (AWS/Azure/   |
    | (Firewall) |   Encrypted  |          |   Encrypted  |  GCP)         |
    +------------+              +----------+              +-------+-------+
         |                                                        |
    Internal Network                                         VPC/VNet
    10.0.0.0/16                                             172.16.0.0/16

Providers: AWS VPN, Azure VPN Gateway, GCP Cloud VPN

Pros: Quick setup (hours), low upfront cost, works from anywhere with internet

Cons: Variable latency and throughput (depends on internet path), typically limited to 1-2 Gbps per tunnel, shared with other internet traffic

Direct/Dedicated Connections

How it works: Private fiber connections from your datacenter (or colocation facility) directly to cloud provider's network. No internet traversal.

    On-Premises              Dedicated Connection           Cloud Provider

    +------------+           +----------------+           +---------------+
    | On-Prem    |  Private  | Carrier/       | Private   | Direct Connect|
    | Router     |<--------->| Colocation     |<--------->| (AWS)         |
    | (BGP)      |   Fiber   | Cross-Connect  |  Fiber    | ExpressRoute  |
    +------------+           +----------------+           | (Azure)       |
         |                                                | Interconnect  |
    Internal Network                                      | (GCP)         |
    10.0.0.0/16                                           +-------+-------+
                                                                  |
                                                             VPC/VNet
                                                           172.16.0.0/16

Providers: AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect

Pros: Predictable low latency, high bandwidth (1-100 Gbps), dedicated capacity (not shared with internet), better for compliance

Cons: Longer setup time (weeks to months), higher upfront cost, requires physical presence or colocation

SD-WAN (Software-Defined WAN)

How it works: Software overlay that intelligently routes traffic over multiple underlay connections (internet VPN, MPLS, dedicated links). Provides application-aware routing and automatic failover.

    On-Premises               SD-WAN Overlay                  Cloud

    +------------+         Path 1: Direct Connect       +-------------+
    | SD-WAN     |=============================>        |             |
    | Appliance  |         Path 2: Internet VPN         | Cloud VPC   |
    | (Multi-    |<----------------------------->        | (SD-WAN     |
    |  Path)     |         Path 3: MPLS                 |  Gateway)   |
    +------------+<----------------------------->        +-------------+

    Intelligent routing: Database traffic --> Direct Connect (low latency)
                        Web traffic ------> Internet VPN (cost-effective)
                        Automatic failover on path degradation

Providers: Cisco Viptela, VMware SD-WAN, Silver Peak, Fortinet

Pros: Best of both worlds—redundancy and cost optimization, application-aware routing, automatic failover

Cons: Additional complexity, requires SD-WAN appliances/licensing, management overhead

# Bandwidth Considerations

Bandwidth requirements depend on your workload mix and how much data flows between on-prem and cloud.

Typical Connection Speeds

+------------------+--------------------+------------------------+
| Connection Type  | Typical Bandwidth  | Use Case               |
+------------------+--------------------+------------------------+
| Internet VPN     | 100 Mbps - 2 Gbps  | Small offices, dev/test|
| (single tunnel)  |                    | light workloads        |
+------------------+--------------------+------------------------+
| Direct Connect   | 1 Gbps - 10 Gbps   | Production hybrid,     |
| (Standard)       |                    | database replication,  |
|                  |                    | file sync              |
+------------------+--------------------+------------------------+
| Direct Connect   | 10 Gbps - 100 Gbps | Large-scale migration, |
| (High-capacity)  |                    | HPC, video rendering,  |
|                  |                    | big data pipelines     |
+------------------+--------------------+------------------------+

Burst vs Sustained: VPN connections share internet bandwidth—peak usage from other applications can degrade cloud connectivity. Dedicated connections provide consistent throughput.

Calculating requirements: Common workloads and their bandwidth needs:

  • Database replication: Depends on write rate. 1000 transactions/sec * 10 KB avg = ~80 Mbps sustained
  • File server sync: Varies wildly. Initial sync of 10 TB over 1 Gbps = ~24 hours
  • Backup to cloud: Nightly 500 GB backup over 8-hour window = ~150 Mbps average
  • VDI (Virtual Desktop): 20-50 Mbps per desktop * number of concurrent users

# Latency Characteristics

Latency matters enormously for chatty protocols (databases, file systems, RPC) and less for bulk transfers.

Typical Latencies

Connection Method           Same City    Cross-Country   Cross-Continent
----------------------      ----------   -------------   ----------------
Direct Connect (Dedicated)  1-5 ms       30-50 ms        80-150 ms
Internet VPN (Best case)    5-15 ms      40-80 ms        100-200 ms
Internet VPN (Congested)    20-100+ ms   100-200+ ms     200-500+ ms

Note: Latencies depend heavily on geographic distance and routing

Impact on workloads:

  • Database queries: Latency-sensitive. 50ms RTT = 20 queries/sec max (serialized). Use read replicas or caching to mitigate
  • File operations (NFS/SMB): Very latency-sensitive. Opening a file = multiple round-trips. 100ms RTT can make file access unusable
  • Bulk transfers: Latency-tolerant if using modern protocols (TCP with window scaling). Throughput = Window Size / RTT
  • API calls (REST/gRPC): Moderate sensitivity. Design for async patterns or cache responses when possible

Geographic Proximity

Choose cloud regions close to on-prem datacenters when latency matters.

Example: San Francisco datacenter to cloud regions

Region                           Distance        Typical Latency
------                           --------        ---------------
AWS us-west-1 (N. California)    ~100 miles      2-5 ms
AWS us-west-2 (Oregon)           ~600 miles      10-15 ms
AWS us-east-1 (Virginia)         ~2,500 miles    60-70 ms
AWS eu-west-1 (Ireland)          ~5,000 miles    140-160 ms

Physics matters: Speed of light in fiber = ~200,000 km/s
                Theoretical minimum (SF to Virginia) = ~40ms RTT

# Resiliency and Redundancy Patterns

Single connections create single points of failure. Production workloads require redundancy.

Single Connection (Not Resilient)

    On-Prem                        Cloud

    [Router]-----(Direct Connect)-----[VPC]

    Problem: Link failure = total outage
             Router failure = total outage
             Planned maintenance = downtime

Dual Connections, Same Provider

    On-Prem                           Cloud

    [Router 1]-----(Direct Connect 1)----+
                                         |----[VPC]
    [Router 2]-----(Direct Connect 2)----+

    Resilient to: Single link failure
                  Single router failure
    Not resilient to: Provider outage (rare but happens)
                      Regional datacenter issues

Configuration: Use BGP with equal-cost multipath (ECMP) for active-active, or BGP AS-path prepending for active-passive. AWS/Azure/GCP all support BGP for automatic failover [1].

Multi-Provider (Maximum Resilience)

    On-Prem                               Cloud

    [Router 1]-----(Direct Connect)-------+
                                          |----[VPC]
    [Router 2]-----(Internet VPN)---------+

    OR:

    [Router 1]-----(Provider A Fiber)-----+
                                          |----[VPC]
    [Router 2]-----(Provider B Fiber)-----+

    Resilient to: Provider outage
                  Link failure
                  Router failure
    Trade-off: Higher cost, more complex routing

Failover automation: BGP automatically reroutes traffic when a path fails. Health checks on both sides ensure quick detection (typical failover: 30-90 seconds).

# Cost Model

Connectivity costs vary dramatically between VPN and dedicated connections. Understanding the cost structure helps you choose the right option.

VPN over Internet Costs

  • Setup: Minimal—VPN gateway costs ~$0.05/hour (AWS) or ~$0.19/hour (Azure) [2]
  • Data transfer: Standard internet egress rates. AWS: $0.09/GB (first 10 TB/month)
  • Total for 1 TB/month: Gateway ~$36/month + 1000 GB * $0.09 = ~$126/month
  • Limitations: Shared bandwidth, variable performance, typically <1 Gbps sustained

Direct Connect / ExpressRoute Costs

  • Port hours: Dedicated port rental. AWS 1 Gbps = $0.30/hour = ~$216/month [3]
  • Data transfer OUT: Reduced rates vs internet. AWS: $0.02/GB (vs $0.09/GB internet)
  • Connection fee: One-time setup via carrier/colocation (varies: $500-$5000+)
  • Total for 1 TB/month: Port $216 + 1000 GB * $0.02 + setup = ~$236/month + setup
  • Break-even: For high-volume (10+ TB/month), Direct Connect is cheaper due to lower transfer costs [4]

Cost Comparison

Monthly Data Transfer    VPN Cost        Direct Connect Cost    Winner
----------------------   --------        --------------------   ------
100 GB                   ~$45            ~$218                  VPN
1 TB                     ~$126           ~$236                  VPN
10 TB                    ~$936           ~$416                  Direct
50 TB                    ~$4,536         ~$1,216                Direct
100 TB                   ~$9,036         ~$2,216                Direct

Assumes: AWS us-east-1 pricing, 1 Gbps Direct Connect
Note: Prices vary by region and provider

Hidden costs: Don't forget router hardware, BGP configuration expertise, colocation fees (if applicable), and ongoing management. For small deployments, VPN's simplicity often wins despite higher per-GB costs.

# Data Migration Patterns

Moving data from on-prem to cloud is a common hybrid scenario. The migration strategy depends on data volume, time constraints, and ongoing sync requirements.

Initial Bulk Migration

Problem: Migrating 100 TB over 1 Gbps = 9+ days of continuous transfer. Not always practical.

Solutions:

  • Physical disk shipping: AWS Snowball (80 TB device), Azure Data Box (100 TB), GCP Transfer Appliance. Ship disks to provider, they ingest data. 1-2 weeks total time [5]
  • Snowmobile (AWS): For petabyte-scale. Literal shipping container with 100 PB capacity. Extreme but real
  • Parallel transfers: Use multiple VPN tunnels or Direct Connect links with parallel rsync/rclone jobs to maximize throughput
  • Incremental seed: Ship initial bulk via Snowball, then sync deltas over network

Continuous Sync

After initial migration, keep on-prem and cloud in sync for hybrid workloads or gradual migration.

  • File sync: Tools like rsync, rclone, AWS DataSync, Azure File Sync. Schedule periodic syncs or near-real-time
  • Database replication: Native replication (MySQL binlog, PostgreSQL streaming replication) over VPN/Direct Connect
  • Object storage sync: S3 sync, gsutil rsync, Azure AzCopy for continuous object uploads
  • Change Data Capture (CDC): Tools like Debezium, AWS DMS capture database changes and stream to cloud in real-time

Hybrid Storage Gateways

Cloud providers offer gateways that present cloud storage as local volumes/shares, caching frequently accessed data on-prem.

    On-Prem                          Cloud

    [Application]
         |
    [Storage Gateway]-----(Cached data)
         |                          |
    (Frequently accessed)     [S3/Blob Storage]
    (cached locally)          (Full dataset)

    Reads: Served from local cache (fast)
    Writes: Written locally, async uploaded to cloud
    Cold data: Fetched from cloud on-demand

Providers: AWS Storage Gateway, Azure StorSimple (deprecated, replaced by File Sync), Google Cloud Storage FUSE

Use case: Gradual migration where applications still need local performance but storage tier is shifting to cloud. Good for file servers, backup archives.

# References

[1] AWS. "AWS Direct Connect - BGP Configuration." Documentation on using BGP for redundant Direct Connect connections with automatic failover. AWS Direct Connect Guide

[2] AWS. "AWS VPN Pricing." Current pricing for VPN connections and data transfer rates. aws.amazon.com/vpn/pricing

[3] AWS. "AWS Direct Connect Pricing." Port hour rates and data transfer pricing for dedicated connections. aws.amazon.com/directconnect/pricing

[4] Multiple Cloud Providers. "Hybrid Cloud Connectivity Cost Analysis." Industry analysis showing Direct Connect becomes cost-effective at ~10TB/month transfer volume. Pricing as of 2024-2025 for major cloud providers.

[5] AWS. "AWS Snowball - Petabyte-Scale Data Transport." Documentation on physical data transfer appliances for bulk migration. aws.amazon.com/snowball

# Key Takeaways

  • VPN over internet: Quick, cheap setup but variable performance (good for <1 TB/month, dev/test)
  • Direct Connect/ExpressRoute: Predictable latency, high bandwidth, cost-effective at scale (10+ TB/month)
  • SD-WAN combines multiple paths for redundancy and intelligent routing
  • Latency matters for databases and file systems—choose nearby cloud regions
  • Redundancy requires multiple connections; BGP provides automatic failover
  • Cost break-even between VPN and dedicated is ~10 TB/month of data transfer
  • Bulk migration: Use physical shipping (Snowball) for large datasets, network for incremental sync
  • Hybrid storage gateways provide local performance with cloud backend for gradual migration