mateusz@systems ~/book/ch01/io-modes $ cat section.md

IO Modes: Buffered vs Direct IO

Understanding how data flows from application to disk—and the role of buffering—is critical for performance optimization and correctness.

# Buffered IO (Default)

By default, file IO goes through the page cache. Reads pull data from the cache if available; writes update cache pages and return immediately (unless O_SYNC is specified).

Read path:

  1. Application calls read()
  2. VFS layer checks page cache for requested offset
  3. Cache hit: Copy from cache to user buffer, return immediately
  4. Cache miss: Trigger read-ahead, wait for IO, populate cache, copy to user buffer

Code: mm/filemap.c:generic_file_read_iter()

Write path:

  1. Application calls write()
  2. VFS layer finds or allocates page cache pages for the write range
  3. Copy data from user buffer to cache pages
  4. Mark pages dirty
  5. Return to application (write appears complete)
  6. Later: Writeback thread flushes dirty pages to disk

Code: mm/filemap.c:generic_perform_write()

Performance characteristics:

  • Excellent for sequential workloads (read-ahead hides latency)
  • Repeated access to same data is very fast (cache hits)
  • Writes are batched and coalesced, improving throughput
  • No special alignment requirements

# Direct IO (O_DIRECT)

Opening a file with O_DIRECT bypasses the page cache. Reads and writes go directly to/from the application's buffer to the storage device.

Requirements:

  • User buffer must be aligned to filesystem block size (usually 4KB)
  • IO offset must be aligned to filesystem block size
  • IO length must be a multiple of filesystem block size
  • Violating alignment causes EINVAL errors
void *buf;
posix_memalign(&buf, 4096, 4096);  // Align to 4KB
int fd = open("datafile", O_DIRECT | O_RDONLY);
pread(fd, buf, 4096, 0);  // Read 4KB at offset 0

When to use Direct IO:

  • Databases: Manage their own caching (e.g., InnoDB buffer pool, PostgreSQL shared buffers). Page cache would be redundant "double buffering."
  • Large sequential reads: Streaming large files where data won't be reused. Avoids polluting page cache.
  • Low-latency requirements: Eliminate cache management overhead for predictable latency.

Pitfalls:

  • Small random reads/writes perform poorly (no buffering or merging)
  • Application must handle alignment complexities
  • Can't leverage kernel read-ahead
  • Mixing direct and buffered IO on the same file causes coherency issues

Code path: fs/direct-io.c and filesystem-specific DIO implementations.

# Synchronous IO Flags

Several flags control write durability:

  • O_SYNC: Writes block until data and metadata are on stable storage. Expensive but ensures durability.
  • O_DSYNC: Like O_SYNC but doesn't wait for metadata updates (e.g., file size, modification time) unless necessary for reading the data back.
  • fsync(fd): System call to flush all dirty data and metadata for a file to disk. Blocks until complete.
  • fdatasync(fd): Like fsync() but skips metadata updates when possible (similar to O_DSYNC).
  • sync_file_range(): Partial sync—flush specific byte range. Doesn't wait for metadata or guarantee ordering.

Performance implications: Synchronous writes serialize IO and force disk flushes, destroying write batching. Use sparingly—only when durability is critical (e.g., database transaction commits).

# Interaction with Filesystems

Different filesystems handle buffered vs direct IO differently:

  • ext4: Supports both modes. Direct IO requires extent-based files (default since ext4).
  • XFS: Excellent direct IO support. Direct IO writes bypass page cache but may still update metadata.
  • ZFS: Direct IO support varies by implementation. OpenZFS on Linux supports O_DIRECT but still writes go through ZFS's internal caching (ARC) before reaching vdevs.
  • NFS: Direct IO still involves network round-trips. Can reduce client-side caching but doesn't eliminate all buffering.