Skip to content

Latest commit

 

History

History
278 lines (239 loc) · 11.8 KB

cheri-pte-ext.adoc

File metadata and controls

278 lines (239 loc) · 11.8 KB

"{cheri_pte_ext_name}" Extension for CHERI Page-Based Virtual-Memory Systems (RV64 only)

CHERI is a security mechanism that is generally orthogonal to page-based virtual-memory management as defined in cite:[riscv-priv-spec]. However, it is helpful in CHERI harts to extend RISC-V’s virtual-memory management to facilitate capability revocation and control the flow of capabilities in memory at the page granularity. For this reason, the {cheri_pte_ext_name} extension adds new bits to RISC-V’s Page Table Entry (PTE) format.

Note
There is no explicit mechanism for enabling or disabling {cheri_pte_ext_name}. A VM-enabled legacy (non-CHERI) OS running in {cheri_int_mode_name} will not load or store capabilities, and so the default state of CW=0 causing loaded capabilities to have their tags cleared, and stored capabilities with their tags set to cause a page fault, won’t occur.

A CHERI-aware OS running a VM-enabled OS is strongly recommended to support {cheri_pte_ext_name}, and the minimum level of support is to set CW to 1 in all PTEs intended for storing capabilities (i.e. anonymous mappings) and leave sstatus.CRG and CRG in all PTEs set to 0, which will allow capabilities with their tags set to be loaded and stored successfully.

Therefore when implementing any RV64 virtual memory translation scheme (Sv39, Sv48 or Sv57) and {cheri_base_ext_name}, implementing {cheri_pte_ext_name} is strongly recommended.

Note
It is possible to detect the presence of {cheri_pte_ext_name} in software, by configuring a page table entry without programming CW and without setting sstatus.CRG, and testing for an exception on storing a tagged capability.
Note
Sv32 (for RV32) does not have any spare PTE bits, and so this extension cannot be implemented.

Limiting Capability Propagation

Page table enforcement can allow the operating system to limit the flow of capabilities between processes. It is highly desirable that a process should only possess capabilities that have been issued for that address space by the operating system. Unix processes may share memory for efficient communication, but capability pointers must not be shared across these channels into a foreign address space. An operating system might defend against this by only issuing a capability to the shared region that does not grant the load/store capability permission. However, there are circumstances where portions of general-purpose, mmapped* memory become shared, and the operating system must prevent future capability communication through those pages. This is not possible without restructuring software, as the capability for the original allocation, which spans both shared memory and private memory, would need to be deleted and replaced with a list of distinct capabilities with appropriate permissions for each range. Such a change would not be transparent to the program. Such sharing through virtual memory is on the page granularity, so preventing capability writes with a PTE permission is a natural solution.

* allocated using mmap

Capability Revocation

Page table enforcement can accelerate concurrent capability revocation for temporal safety. Without page table capability protection, a concurrent capability revocation sweep must begin by visiting all PTEs to mark them unreadable, henceforth trapping on any read to a new page to sweep it clean before proceeding. With a page-granularity generational capability read permission, we can eliminate the initial permission change of all PTEs. In addition, a page-granularity capability write control can eliminate many pages from the sweep that are known to not contain capabilities.

Extending the Page Table Entry Format

The page table entry format remains unchanged for Sv32. However, two new bits, Capability Write (CW) and Capability Read Generation (CRG), are added to leaf PTEs in Sv39, Sv48 and Sv57 as shown in Sv39 page table entry, Sv48 page table entry and Sv57 page table entry respectively.

Sv39 page table entry

img/sv39pte.edn

Sv48 page table entry

img/sv48pte.edn

Sv57 page table entry

img/sv57pte.edn

Note
All behaviour related to {cheri_pte_ext_name} requires the authorizing capability to have [c_perm]. If [c_perm] is not granted then all capability load/stores and AMOs always clear the tag and the behaviour in this section is not relevant.

The CW bit indicates whether reading or writing capabilities with the tag set to the virtual page is permitted. When the CW bit is set, capabilities are written as usual, and capability reads are controlled by the CRG bit.

If the CW bit is clear then:

  • When a capability load or AMO instruction is executed, the implementation clears the tag bit of the capability read from the virtual page.

  • When CRG is clear, the "no capability state", a store page fault exception is raised when a capability store or AMO instruction is executed and the tag bit of the capability being written is set.

  • When CRG is set, the "pre-CW state", two schemes are permitted:

    • The same behavior as when CRG is clear, allowing software interpretation of this state.

    • When a capability store or AMO instruction is executed and the tag bit of the capability being written is set, the implementation sets the CW bit and assigns the CRG bit equal to sstatus.CRG. The PTE update must be atomic with respect to other accesses to the PTE, and must atomically check that the PTE is valid and grants sufficient permissions. Updates to the CW bit and CRG bit must be exact (i.e. not speculative), and observed in program order by the local hart. Furthermore, the PTE update must appear in the global memory order no later than the explicit memory access, or any subsequent explicit memory access to that virtual page by the local hart. The ordering on loads and stores provided by FENCE instructions and the acquire/release bits on atomic instructions also orders the PTE updates associated with those loads and stores as observed by remote harts.

      The PTE update is not required to be atomic with respect to the explicit memory access that caused the update, and the sequence is interruptible. However, the hart must not perform explicit memory access before the PTE update is globally visible.

When CW is set, the CRG bit indicates the current generation of the virtual memory page with regards to the ongoing capability revocation cycle. Two schemes are permitted:

  • A load page fault exception is raised when a capability load or AMO instruction is executed and the virtual page’s CRG bit does not equal sstatus.CRG.

  • A load page fault exception is raised when a capability load or AMO instruction is executed, the capability read from memory has tag set and the virtual page’s CRG bit does not equal sstatus.CRG.

Note
An implementation may avoid a dependency on the tag bit value of the capability read. This will introduce additional traps during revocation sweeps.
Table 1. Summary of Load CW and CRG behavior in the PTEs
PTE.CW PTE.CRG Load/AMO

0

X

Clear loaded tag

1

sstatus.CRG

Page fault, or page fault if tag is set1

1

= sstatus.CRG

Normal operation

Table 2. Summary of Store CW and CRG behavior in the PTEs
PTE.CW PTE.CRG Store/AMO

0

0

Page fault if stored tag is set

0

1

Page fault if stored tag is set, or hardware CW and CRG update2

1

X

Normal operation

1 The choice here is whether to take data dependent exceptions on loads or atomic operations. It is legal for the implementation to fault even if the tag is not set since this behaviour is only an optimization for software. This means it is also legal to only check the tag under certain conditions and conservatively fault otherwise.

2 The choice here follows the pattern of whether to implement the Svade extension to take page-fault exceptions for A and D PTE bit state changes, or whether to implement a hardware updating mechanism. Software should implement support for a page fault in these cases which will not be used if the hardware mechanism is implemented.

Extending the Supervisor (sstatus) and Virtual Supervisor (vsstatus) Status Registers

The sstatus and vsstatus CSRs are extended to include the new Capability Read Generation (CRG) bit as shown.

When V=1 vsstatus.CRG is in effect.

mstatus.CRG also exists. Reading or writing it is equivalent to reading or writing sstatus.CRG.

Note
As there is no M-mode translation available in RISC-V, there is no current software use for mstatus.CRG. It is only included not to break the rule that sstatus is required to be a subset of mstatus.
Virtual Supervisor-mode status (mstatus) register when MXLEN=64
{reg: [
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'SIE'},
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'MIE'},
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'SPIE'},
  {bits:  1, name: 'UBE'},
  {bits:  1, name: 'MPIE'},
  {bits:  1, name: 'SPP'},
  {bits:  2, name: 'VS[1:0]'},
  {bits:  2, name: 'MPP[1:0]'},
  {bits:  2, name: 'FS[1:0]'},
  {bits:  2, name: 'XS[1:0]'},
  {bits:  1, name: 'MPRV'},
  {bits:  1, name: 'SUM'},
  {bits:  1, name: 'MXR'},
  {bits:  1, name: 'TVM'},
  {bits:  1, name: 'TW'},
  {bits:  1, name: 'TSR'},
  {bits:  1, name: 'SPELP'},
  {bits:  1, name: 'SDT'},
  {bits:  7, name: 'WPRI'},
  {bits:  2, name: 'UXL[1:0]'},
  {bits:  2, name: 'SXL[1:0]'},
  {bits:  1, name: 'SBE'},
  {bits:  1, name: 'MBE'},
  {bits:  1, name: 'GVA'},
  {bits:  1, name: 'MPV'},
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'MPELP'},
  {bits:  1, name: 'MDT'},
  {bits: 19, name: 'WPRI'},
  {bits:  1, name: 'CRG'},
  {bits:  1, name: 'SD'},
], config:{lanes: 4, hspace:1024}}
Supervisor-mode status (sstatus) register when SXLEN=64
{reg: [
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'SIE'},
  {bits:  3, name: 'WPRI'},
  {bits:  1, name: 'SPIE'},
  {bits:  1, name: 'UBE'},
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'SPP'},
  {bits:  2, name: 'VS[1:0]'},
  {bits:  2, name: 'WPRI'},
  {bits:  2, name: 'FS[1:0]'},
  {bits:  2, name: 'XS[1:0]'},
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'SUM'},
  {bits:  1, name: 'MXR'},
  {bits:  3, name: 'WPRI'},
  {bits:  1, name: 'SPELP'},
  {bits:  1, name: 'SDT'},
  {bits:  7, name: 'WPRI'},
  {bits:  2, name: 'UXL[1:0]'},
  {bits: 28, name: 'WPRI'},
  {bits:  1, name: 'CRG'},
  {bits:  1, name: 'SD'},
], config:{lanes: 4, hspace:1024}}
Virtual Supervisor-mode status (vsstatus) register when VSXLEN=64
{reg: [
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'SIE'},
  {bits:  3, name: 'WPRI'},
  {bits:  1, name: 'SPIE'},
  {bits:  1, name: 'UBE'},
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'SPP'},
  {bits:  2, name: 'VS[1:0]'},
  {bits:  2, name: 'WPRI'},
  {bits:  2, name: 'FS[1:0]'},
  {bits:  2, name: 'XS[1:0]'},
  {bits:  1, name: 'WPRI'},
  {bits:  1, name: 'MXR'},
  {bits:  1, name: 'SUM'},
  {bits: 12, name: 'WPRI'},
  {bits:  2, name: 'UXL[1:0]'},
  {bits: 28, name: 'WPRI'},
  {bits:  1, name: 'CRG'},
  {bits:  1, name: 'SD'}
], config:{lanes: 4, hspace:1024}}