-
Notifications
You must be signed in to change notification settings - Fork 11
Kernel Module
The following presents an overview of the mpool kernel module architecture and its key data structures. We start with a description of these data structures to provide context for the subsequent architecture discussion.
The reader is assumed to have reviewed the concepts section of this Wiki, which introduces the mpool operational and object models, and associated terminology.
The figure below depicts the key internal data structures that the mpool kernel module maintains for each active mpool. These are superblocks (SB), metadata containers (MDCs), mblocks, and mlogs. Each MDC is in fact a pair of mlogs accessed using the MDC APIs. In the figure, the MDCs are labeled MDC-0 through MDC-255, which is the maximum number of MDCs ever allocated.
data:image/s3,"s3://crabby-images/ef27e/ef27e3f3f0a8b44ba00da0e77188ef7b9dc12d2e" alt=""
The storage volume for each media class in an mpool has a superblock comprising the metadata required to uniquely identify the volume, the mpool to which the volume belongs, and the media class within that mpool to which the volume is assigned. The superblock on the volume assigned to the required capacity media class also includes metadata for accessing MDC-0. Linux libblkid recognizes the identity portions of the mpool superblock as of util-linux v2.32.
MDC-0 is a distinguished container that stores both the metadata for accessing MDC-1 through MDC-255 and all mpool properties. Two examples of such properties are the authoritative list of all mpool volumes, and the percent of spare capacity configured for each media class volume.
MDC-1 through MDC-255 (MDC-1/255) store the metadata for accessing all mblocks and mlogs allocated by mpool clients. MDC-1/255 are created dynamically as needed, with enough MDCs allocated when the mpool is created to provide the requisite concurrency. For performance, mblock and mlog OIDs incorporate the number of the MDC storing the metadata for that object.
The MDC API pattern embodies the concept of compaction to deal with one of the mlog pairs filling. However, what it means to compact is use-case dependent. In the context of MDC-1/255, compacting MDC-K is simply serializing the in-memory metadata for accessing the still-live client objects associated with MDC-K. In the context of MDC-0, compacting is simply serializing the in-memory mpool properties and in-memory metadata for accessing MDC-1/255.
Because MDC-0 does not store metadata for client objects, updates to it are rare. Hence, compacting MDC-0 is also rare. This is a desirable property because compacting MDC-0 requires updating the superblocks on the volume assigned to the capacity media class for the mpool.
Though not shown in the figure, the metadata for the well-known mlogs comprising the root MDC for the mpool are stored in MDC-1.
Note that mpool is implemented in terms of the same mlog and MDC abstractions that it exports to clients. Doing so is a natural fit and reduces the size and complexity of the code base.
The figure below presents a block diagram of the mpool kernel module architecture. It is structured as a character device driver implementing the following functions:
-
struct file_operations
: open, release, unlocked_ioctl, mmap -
struct vm_operations_struct
: open, close, fault -
struct address_space_operations
: readpages, releasepage, invalidatepage, migratepage
data:image/s3,"s3://crabby-images/32bc6/32bc69a6a2efef843902a4451b0011ec27fd4209" alt=""
Below is a summary of each component in the block diagram, including how it
interacts with other components, and which source files implement it.
Source files for the mpool kernel module are in the
mpool-kmod
repo.
Source files for the mpool UAPI are in the
mpool
repo, though they are
not discussed further in this section.
- Implements
struct file_operations
- Translates ioctls into calls to the mpool API, mblock API, mlog API, or mcache API components
Source files: mpctl.[hc]
- Implements
struct vm_operations_struct
andstruct address_space_operations
- Implements the mcache map management functions for memory-mapping an arbitrary collection of mblocks and registers the associated VMA with the Reaper component
- Uses the mblock API component to read mblock data to satisfy page faults (relationship not shown in the block diagram)
Source files: mcache.[hc]
- Proactively evicts mblock data from the page cache, based on object-level metrics, to avoid excess memory pressure
Source files: reaper.[hc]
- Implements the mpool management functions: create, destroy, activate, deactivate, etc.
- Uses the SB component to initialize the superblocks on a media class volume when creating an mpool or adding a media class, and to read the superblocks when activating an mpool
- Uses the Metadata Mgr component to load mpool properties and MDC-1/255 metadata from MDC-0, and client mblock and mlog metadata from MDC-1/255, when activating an mpool
- Uses the Metadata Mgr component to update mpool properties in MDC-0, e.g., if a media class volume is added to the mpool
Source files: mp.[hc]
- Implements the mblock management functions: allocate, write, commit, read, delete, etc.
- Uses the Metadata Mgr component to reserve storage space (via the Space Map component) when allocating an mblock; store metadata for the mblock in its associated MDC-K when committing it; and record end-of-life for the mblock in its associated MDC-K when deleting it
- Uses the Device Operations component to write and read mblock data
Source files: mblock.[hc]
- Implements the mlog management functions: allocate, commit, append, flush, read, erase, etc.
- Uses the Metadata Mgr component to reserve storage space (via the Space Map component) when allocating an mlog; store metadata for the mlog in its associated MDC-K when committing it; record a new generation number for the mlog in its associated MDC-K when erasing it; etc.
- Uses the Device Operations component to write and read mlog data
Source files: mlog.[hc]
, mlog_utils.[hc]
- Implements the MDC (metadata container) utilities for storing and maintaining metadata
- Uses the mlog API component, which it wraps
Source files: mdc.[hc]
- Provides services for reading and updating mpool metadata, stored in MDC-0 and MDC-1/255, to the mpool API, mblock API, and mlog API components
- Uses the MDC API component to read, append, and compact the MDCs for an mpool
- Uses the SB component when compacting MDC-0 to update the superblocks storing the MDC-0 metadata (relationship not shown in the block diagram)
- Uses the Space Map component to reserve storage space when allocating an mlog or mblock
Source files: pmd.[hc]
, pmd_obj.[hc]
- Implements a free-space map for each media class volume in an active mpool
Free-space maps are maintained in memory only. When an mpool is activated, the free-space map is reconstructed from the metadata for live mlog and mblock objects read from MDC-0 and MDC-1/255. The space occupied by MDC-0 is populated from its superblock metadata.
Source files: smap.[hc]
- Provides utilities to initialize, read, update, and erase superblocks
- Maintains multiple copies of superblocks for recovery in the event of corruption
Source files: sb.[hc]
- Provides facilities to write, read, discard, and flush mblock and mlog data
- Generates BIOs to the block stack
Source files: pd.[hc]
Utilities of interest not shown in the block diagram include
-
omf*.[hc]
for translating structures to and from their little-endian on-media format -
upgrade.[hc]
for managing on-media format changes when compacting MDCs -
evc.[hc]
which provides event counters -
merr.[hc]
which provides error logging
Get Started
Configure mpools
Get Help
Develop mpool Applications
Explore mpool Internals