Chapter 1
File Systems Deep Dive
Filesystems are the primary interface for storing and accessing data, both within systems and across networks. For systems engineers working in production environments, understanding filesystem internals isn't academic—it's essential for making informed decisions about performance, reliability, and operational characteristics.
This chapter explores filesystems across three critical dimensions:
- Taxonomy: Understanding the landscape of filesystem types and their use cases
- Guarantees: Synchronization semantics and resiliency properties
- Performance: How design choices impact real-world workloads
We'll go deep into implementation details, examining kernel code, protocol specifications, and operational trade-offs. By the end, you'll understand not just what different filesystems do, but why they behave the way they do—and how to debug them when things go wrong.
# Chapter Sections
Framework for evaluating network filesystems across semantics, architecture, and failure modes. Covers NFS, Lustre, GPFS, and WekaFS with architecture diagrams and comparison tables. Includes AWS S3 as a counterexample to POSIX filesystems.
ZFS: A Modern Local FilesystemEnd-to-end checksumming, copy-on-write architecture, integrated RAID, and snapshots. Understand ZFS's unique features, RAM requirements, and when to choose it over traditional filesystems.
Filesystem HacksBattle-tested techniques: squashfs on NFS, tmpfs for performance testing, loop devices, bind mounts, sparse files, and FUSE. Unconventional approaches that solve real problems.
The Linux Page CacheHow the page cache works, read-ahead mechanisms, dirty page writeback, and memory reclaim. Essential for understanding Linux filesystem performance.
Essential Linux Filesystem SemanticsAtomic rename, unlink with open file descriptors, O_TMPFILE, file locking (flock vs fcntl), and sparse files. POSIX operations that reliable software depends on.
IO Modes: Buffered vs Direct IOUnderstanding buffered IO, direct IO (O_DIRECT), synchronous IO flags (O_SYNC, fsync), and how different filesystems handle these modes.
Memory-Mapped FilesHow mmap() works, relationship to page cache, synchronization semantics, and performance characteristics. (Draft section - content in progress)
Synchronization GuaranteesCoordinating concurrent access across processes and systems. Local POSIX semantics, NFS close-to-open, and distributed locking. (Draft section - content in progress)
Resiliency GuaranteesSurviving crashes, power loss, and corruption. Journaling, copy-on-write, fsync semantics, and checksumming. (Draft section - content in progress)
Performance CharacteristicsBlock allocation, metadata overhead, caching strategies, IO patterns, and distributed filesystem performance. (Draft section - content in progress)
Deep Dive: ZFS recordsizeWhat recordsize controls, compression interaction, workload matching, performance measurement, and code walkthrough. (Draft section - content in progress)
Deep Dive: NFS Close-to-Open ConsistencyProtocol sequence, edge cases, performance implications, tuning parameters, and code walkthrough. (Draft section - content in progress)
Observability and DebuggingTool landscape, filesystem-specific debugging tools, and practical scenarios with eBPF/bpftrace examples. (Draft section - content in progress)