Memory-Mapped Files (Draft) :: mateusz.systems

mateusz@systems ~/book/ch01/mmap $ cat section.md

Memory-Mapped Files

[DRAFT - Section in progress]

Memory-mapped files allow applications to access files as if they were regions of memory, eliminating explicit read/write system calls. The kernel handles paging transparently, bringing file content into memory on demand via page faults.

# Fundamentals

mmap() maps a file (or portion of a file) into the process's address space:

int fd = open("datafile", O_RDWR);
void *addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset);
// Access file via memory operations
addr[0] = 42;  // Modifies file
munmap(addr, length);

Mapping modes:

MAP_SHARED: Changes visible to other processes mapping the same file. Writes eventually propagate to the underlying file.
MAP_PRIVATE: Copy-on-write. Changes are private to this process and don't affect the file or other mappings.

# Relationship to Page Cache

Memory-mapped files use the page cache as backing store. When you access a mapped page:

Page fault occurs (page not in process page tables)
Kernel checks if file page is in page cache
If yes: Map it into process address space
If no: Trigger IO to read from disk, add to page cache, then map

This means multiple processes mapping the same file share the same physical pages via the page cache.

# Synchronization Semantics (DRAFT)

TODO: Expand this section

Reads see writes from same/other processes (eventually via page cache)
msync() for explicit flush to disk
No read/write ordering guarantees without explicit sync
Overlapping read/write races possible

# Performance Characteristics (DRAFT)

TODO: Expand with benchmarks and use cases

Avoids system call overhead for repeated access
Kernel manages paging transparently
TLB pressure with large mappings (huge pages help)
When NOT to use: sequential streaming, small random access

# Use Cases and Pitfalls (DRAFT)

TODO: Add concrete examples