Skip to content
This repository has been archived by the owner on Nov 17, 2022. It is now read-only.

Concepts

hse-project edited this page Dec 9, 2020 · 2 revisions

The following describes mpool concepts that are important to understand for developing mpool client applications and making effective use of the mpool API.

Mpool Model

The mpool operational model is intentionally aligned to that of Linux LVM to the degree practical. Mpool APIs are provided to

  • Create and destroy an mpool
  • Activate and deactivate an mpool
  • Scan for all mpools on a system and optionally activate or deactivate them
  • Manage mpool attributes
  • Add a media class to an mpool
  • Extend the capacity of an mpool media class
  • Attach an application-type ID to an mpool identifying the client storing data

These APIs are reflected in the mpool CLI commands used in the mpool configuration examples. The mpool media class abstraction is also described with those examples.

Object Model

Mpool implements an object store with two simple object abstractions: media blocks (mblocks) and media logs (mlogs).

mblocks

Mblock objects are containers comprising a linear sequence of bytes that can be written exactly once, are immutable after writing, and can be read in whole or in part as needed until deleted. Mblocks in a given media class are fixed size, which is configured when an mpool media class is created, though the amount of data written to mblocks may differ.

The mblock APIs implement a pattern whereby an mblock is allocated, written, and then committed or aborted. An mblock is not persistent or readable until committed, and a system failure prior to commit results in the same logical mpool state as if the mblock had never been allocated. An mblock allocation returns an object ID (OID) that is used to write and commit; read as needed; and delete the mblock. This API pattern is designed to simplify metadata management for mpool clients, for example in transaction logging mblock OIDs.

mlogs

Mlog objects are containers for record logging. Records of arbitrary size can be appended to an mlog until it is full. Once full, an mlog must be erased before additional records can be appended. Mlog records can be read sequentially from the beginning at any time. Mlogs in a media class are always a multiple of the mblock size for that media class.

The mlog APIs implement a pattern whereby an mlog is allocated and then committed or aborted. An mlog is not persistent or accessible until committed, and a system failure prior to commit results in the same logical mpool state as if the mlog had never been allocated. An mlog allocation returns an OID that is used to commit; append, flush, erase, or read as needed; and delete the mlog. Here again, this API pattern is designed to simplify metadata management for clients, for example in transaction logging mlog OIDs.

mcache maps

Mblock and mlog writes avoid the Linux page cache. Mblocks are written, committed, and made immutable before they can be read either directly (avoiding the page cache) or memory-mapped. Mlogs are always read and updated directly and cannot be memory-mapped.

Mblocks are memory-mapped by creating an mcache map. The mcache map APIs allow an arbitrary collection of mblocks (specified as a vector of mblock OIDs) to be mapped linearly into the virtual address space of an mpool client. Creating an mcache map returns a handle that can be used to get the base address of an mblock; get a vector of addresses for a specified set of page offsets in one or more mblocks; give advise about an address range within an mblock; and purge or unmap the mblocks in the mcache map.

The ability to memory-map an arbitrary collection of mblocks allows a client to store related information in multiple mblocks and then memory-map that information as a unit.

Summary

The mpool object model is intentionally restrictive to align to both traditional block interfaces, and emerging SSD interfaces, such as NVMe Zoned Namespaces (ZNS), that impose constraints on I/Os, especially writes. Mpool is also designed for use with persistent memory. The mpool object model insulates applications from these storage device and media details.

MDC Model

Mpool provides the metadata container (MDC) APIs that clients can use for storing and maintaining metadata. These MDC APIs are implemented as helper functions built on a pair of mlogs per MDC.

The MDC APIs make it easy for a client to

  • Append metadata update records to the active mlog of an MDC until it is full (or exceeds some client-specific threshold)
  • Flag the start of a compaction which marks the other mlog of the MDC as active
  • Re-serialize its metadata by appending it to the (newly) active mlog of the MDC
  • Flag the end of the compaction
  • Continue appending metadata update records to the MDC until the above process repeats

The MDC API functions handle all failures, including crash recovery, by using special markers recognized by the mlog implementation.

When an mpool is created, a pair of mlogs are instantiated with well-known OIDs comprising the root MDC of the mpool. The root MDC provides a location for mpool clients to store whatever metadata they need for start-up.