Skip to content
This repository has been archived by the owner on Nov 17, 2022. It is now read-only.

Kernel Module

hse-project edited this page Aug 29, 2020 · 3 revisions

The following presents an overview of the mpool kernel module architecture and its key data structures. We start with a description of these data structures to provide context for the subsequent architecture discussion.

The reader is assumed to have reviewed the concepts section of this Wiki, which introduces the mpool operational and object models, and associated terminology.

Key Data Structures

The figure below depicts the key internal data structures that the mpool kernel module maintains for each active mpool. These are superblocks (SB), metadata containers (MDCs), mblocks, and mlogs. Each MDC is in fact a pair of mlogs accessed using the MDC APIs. In the figure, the MDCs are labeled MDC-0 through MDC-255, which is the maximum number of MDCs ever allocated.

The storage volume for each media class in an mpool has a superblock comprising the metadata required to uniquely identify the volume, the mpool to which the volume belongs, and the media class within that mpool to which the volume is assigned. The superblock on the volume assigned to the required capacity media class also includes metadata for accessing MDC-0. Linux libblkid recognizes the identity portions of the mpool superblock as of util-linux v2.32.

MDC-0 is a distinguished container that stores both the metadata for accessing MDC-1 through MDC-255 and all mpool properties. Two examples of such properties are the authoritative list of all mpool volumes, and the percent of spare capacity configured for each media class volume.

MDC-1 through MDC-255 (MDC-1/255) store the metadata for accessing all mblocks and mlogs allocated by mpool clients. MDC-1/255 are created dynamically as needed, with enough MDCs allocated when the mpool is created to provide the requisite concurrency. For performance, mblock and mlog OIDs incorporate the number of the MDC storing the metadata for that object.

The MDC API pattern embodies the concept of compaction to deal with one of the mlog pairs filling. However, what it means to compact is use-case dependent. In the context of MDC-1/255, compacting MDC-K is simply serializing the in-memory metadata for accessing the still-live client objects associated with MDC-K. In the context of MDC-0, compacting is simply serializing the in-memory mpool properties and in-memory metadata for accessing MDC-1/255.

Because MDC-0 does not store metadata for client objects, updates to it are rare. Hence, compacting MDC-0 is also rare. This is a desirable property because compacting MDC-0 requires updating the superblocks on the volume assigned to the capacity media class for the mpool.

Though not shown in the figure, the metadata for the well-known mlogs comprising the root MDC for the mpool are stored in MDC-1.

Note that mpool is implemented in terms of the same mlog and MDC abstractions that it exports to clients. Doing so is a natural fit and reduces the size and complexity of the code base.

Block Diagram

The figure below presents a block diagram of the mpool kernel module architecture. It is structured as a character device driver implementing the following functions:

  • struct file_operations: open, release, unlocked_ioctl, mmap
  • struct vm_operations_struct: open, close, fault
  • struct address_space_operations: readpages, releasepage, invalidatepage, migratepage

Below is a summary of each component in the block diagram, including how it interacts with other components, and which source files implement it. Source files for the mpool kernel module are in the mpool-kmod repo. Source files for the mpool UAPI are in the mpool repo, though they are not discussed further in this section.

mpool Control

  • Implements struct file_operations
  • Translates ioctls into calls to the mpool API, mblock API, mlog API, or mcache API components

Source files: mpctl.[hc]

mcache API

  • Implements struct vm_operations_struct and struct address_space_operations
  • Implements the mcache map management functions for memory-mapping an arbitrary collection of mblocks and registers the associated VMA with the Reaper component
  • Uses the mblock API component to read mblock data to satisfy page faults (relationship not shown in the block diagram)

Source files: mcache.[hc]

Reaper

  • Proactively evicts mblock data from the page cache, based on object-level metrics, to avoid excess memory pressure

Source files: reaper.[hc]

mpool API

  • Implements the mpool management functions: create, destroy, activate, deactivate, etc.
  • Uses the SB component to initialize the superblocks on a media class volume when creating an mpool or adding a media class, and to read the superblocks when activating an mpool
  • Uses the Metadata Mgr component to load mpool properties and MDC-1/255 metadata from MDC-0, and client mblock and mlog metadata from MDC-1/255, when activating an mpool
  • Uses the Metadata Mgr component to update mpool properties in MDC-0, e.g., if a media class volume is added to the mpool

Source files: mp.[hc]

mblock API

  • Implements the mblock management functions: allocate, write, commit, read, delete, etc.
  • Uses the Metadata Mgr component to reserve storage space (via the Space Map component) when allocating an mblock; store metadata for the mblock in its associated MDC-K when committing it; and record end-of-life for the mblock in its associated MDC-K when deleting it
  • Uses the Device Operations component to write and read mblock data

Source files: mblock.[hc]

mlog API

  • Implements the mlog management functions: allocate, commit, append, flush, read, erase, etc.
  • Uses the Metadata Mgr component to reserve storage space (via the Space Map component) when allocating an mlog; store metadata for the mlog in its associated MDC-K when committing it; record a new generation number for the mlog in its associated MDC-K when erasing it; etc.
  • Uses the Device Operations component to write and read mlog data

Source files: mlog.[hc], mlog_utils.[hc]

MDC API

  • Implements the MDC (metadata container) utilities for storing and maintaining metadata
  • Uses the mlog API component, which it wraps

Source files: mdc.[hc]

Metadata Mgr

  • Provides services for reading and updating mpool metadata, stored in MDC-0 and MDC-1/255, to the mpool API, mblock API, and mlog API components
  • Uses the MDC API component to read, append, and compact the MDCs for an mpool
  • Uses the SB component when compacting MDC-0 to update the superblocks storing the MDC-0 metadata (relationship not shown in the block diagram)
  • Uses the Space Map component to reserve storage space when allocating an mlog or mblock

Source files: pmd.[hc], pmd_obj.[hc]

Space Map

  • Implements a free-space map for each media class volume in an active mpool

Free-space maps are maintained in memory only. When an mpool is activated, the free-space map is reconstructed from the metadata for live mlog and mblock objects read from MDC-0 and MDC-1/255. The space occupied by MDC-0 is populated from its superblock metadata.

Source files: smap.[hc]

SB

  • Provides utilities to initialize, read, update, and erase superblocks
  • Maintains multiple copies of superblocks for recovery in the event of corruption

Source files: sb.[hc]

Device Operations

  • Provides facilities to write, read, discard, and flush mblock and mlog data
  • Generates BIOs to the block stack

Source files: pd.[hc]

Miscellaneous

Utilities of interest not shown in the block diagram include

  • omf*.[hc] for translating structures to and from their little-endian on-media format
  • upgrade.[hc] for managing on-media format changes when compacting MDCs
  • evc.[hc] which provides event counters
  • merr.[hc] which provides error logging