The Linux Page Cache (Draft) :: mateusz.systems

mateusz@systems ~/book/ch01/page-cache $ cat section.md

The Linux Page Cache

The page cache is fundamental to understanding Linux filesystem performance. It's a unified cache in RAM that stores recently accessed file data, eliminating expensive disk IO for repeated reads and buffering writes for efficient batching.

# Architecture and Role

The page cache sits between the VFS (Virtual File System) layer and individual filesystem implementations. When you read or write a file, the kernel first checks if the data exists in the page cache. Cache hits return immediately from RAM; cache misses trigger IO to the underlying storage.

Key kernel structure: struct address_space (defined in include/linux/fs.h). This represents the pagecache for a particular inode. It maintains a radix tree (now xarray in newer kernels) mapping file offsets to cached pages.

# Read-Ahead Mechanisms

When the kernel detects sequential reads, it speculatively reads additional pages beyond what was requested. This read-ahead reduces latency for streaming workloads by keeping the IO pipeline full.

Read-ahead window starts small and grows as sequential access continues. If the access pattern becomes random, read-ahead is disabled to avoid wasting memory and IO bandwidth.

Tunables: /sys/block/<device>/queue/read_ahead_kb controls maximum read-ahead size.

# Dirty Page Writeback

Writes to files are initially buffered in the page cache as "dirty" pages. The kernel flushes dirty pages to disk asynchronously, batching writes for efficiency. This improves write throughput but introduces complexity around durability and consistency.

Key sysctls controlling writeback behavior:

vm.dirty_ratio: Percentage of total memory that can be dirty before processes block on writes
vm.dirty_background_ratio: Percentage that triggers background writeback (without blocking)
vm.dirty_writeback_centisecs: How often (in centiseconds) the writeback thread wakes up
vm.dirty_expire_centisecs: How old a dirty page must be before it's eligible for writeback

Code path: mm/page-writeback.c implements the writeback logic. The wb_writeback() function is the core writeback worker.

# Page Cache Pressure and Reclaim

Linux uses a "free memory is wasted memory" philosophy—the page cache grows to consume available RAM. When applications need memory, the kernel reclaims pages from the cache. Clean pages (backed by disk) are simply discarded; dirty pages must be written back first.

The kswapd kernel thread manages memory reclaim. It uses an LRU (Least Recently Used) approximation to evict cold pages. Under memory pressure, page reclaim can become a bottleneck.

# Filesystem-Specific Caches

Some filesystems maintain their own caches on top of (or instead of) the page cache:

ZFS ARC (Adaptive Replacement Cache): ZFS manages its own cache with more sophisticated replacement policies than simple LRU. Balances between recently used and frequently used data. Can cause confusion when monitoring memory usage—the ARC appears as used memory but is reclaimable.
XFS buffer cache: XFS uses the page cache for file data but maintains separate metadata buffers for structural information (inodes, directories, extent maps).

# Observing Page Cache Behavior

The /proc/meminfo file exposes page cache statistics:

Cached: Page cache size (clean file-backed pages)
Dirty: Modified pages not yet written to disk
Writeback: Pages currently being written to disk
Mapped: Pages mapped into process address spaces (via mmap)

Tools like vmtouch can show which specific files are cached and even force files into or out of the cache for testing.