CHERI is a security mechanism that is generally orthogonal to page-based virtual-memory management as defined in cite:[riscv-priv-spec]. However, it is helpful in CHERI harts to extend RISC-V’s virtual-memory management to facilitate capability revocation and control the flow of capabilities in memory at the page granularity. For this reason, the {cheri_pte_ext_name} extension adds new bits to RISC-V’s Page Table Entry (PTE) format.
Note
|
There is no explicit mechanism for enabling or disabling {cheri_pte_ext_name}. A VM-enabled legacy (non-CHERI) OS running in {cheri_int_mode_name} will not load or store capabilities, and so the default state of CW=0 causing loaded capabilities to have their tags cleared, and stored capabilities with their tags set to cause a page fault, won’t occur. |
A CHERI-aware OS running a VM-enabled OS is strongly recommended to support {cheri_pte_ext_name}, and the minimum level of support is to set CW to 1 in all PTEs intended for storing capabilities (i.e. anonymous mappings) and leave sstatus.CRG and CRG in all PTEs set to 0, which will allow capabilities with their tags set to be loaded and stored successfully.
Therefore when implementing any RV64 virtual memory translation scheme (Sv39, Sv48 or Sv57) and {cheri_base_ext_name}, implementing {cheri_pte_ext_name} is strongly recommended.
Note
|
It is possible to detect the presence of {cheri_pte_ext_name} in software, by configuring a page table entry without programming CW and without setting sstatus.CRG, and testing for an exception on storing a tagged capability. |
Note
|
Sv32 (for RV32) does not have any spare PTE bits, and so this extension cannot be implemented. |
Page table enforcement can allow the operating system to limit the flow of capabilities between processes. It is highly desirable that a process should only possess capabilities that have been issued for that address space by the operating system. Unix processes may share memory for efficient communication, but capability pointers must not be shared across these channels into a foreign address space. An operating system might defend against this by only issuing a capability to the shared region that does not grant the load/store capability permission. However, there are circumstances where portions of general-purpose, mmapped* memory become shared, and the operating system must prevent future capability communication through those pages. This is not possible without restructuring software, as the capability for the original allocation, which spans both shared memory and private memory, would need to be deleted and replaced with a list of distinct capabilities with appropriate permissions for each range. Such a change would not be transparent to the program. Such sharing through virtual memory is on the page granularity, so preventing capability writes with a PTE permission is a natural solution.
* allocated using mmap
Page table enforcement can accelerate concurrent capability revocation for temporal safety. Without page table capability protection, a concurrent capability revocation sweep must begin by visiting all PTEs to mark them unreadable, henceforth trapping on any read to a new page to sweep it clean before proceeding. With a page-granularity generational capability read permission, we can eliminate the initial permission change of all PTEs. In addition, a page-granularity capability write control can eliminate many pages from the sweep that are known to not contain capabilities.
The page table entry format remains unchanged for Sv32. However, two new bits, Capability Write (CW) and Capability Read Generation (CRG), are added to leaf PTEs in Sv39, Sv48 and Sv57 as shown in Sv39 page table entry, Sv48 page table entry and Sv57 page table entry respectively. For non-leaf PTEs these bits remain reserved and must be cleared by software for forward compatibility, or else a page-fault exception is raised.
Note
|
The behavior in this section isn’t relevant if: |
-
The authorizing capability doesn’t have [c_perm], for loads, stores and AMOs.
-
{cheri_levels_ext_name} has cleared the stored tag, for stores and AMOs.
The CW bit indicates whether reading or writing capabilities with the tag set to the virtual page is permitted. When the CW bit is set, capabilities are written as usual, and capability reads are controlled by the CRG bit.
If the CW bit is clear then:
-
When a capability load or AMO instruction is executed, the implementation clears the tag bit of the capability read from the virtual page.
-
When CRG is clear, the "no capability state", a store/AMO page fault exception is raised when a capability store or AMO instruction is executed and the tag bit of the capability being written is set.
-
When CRG is set, the "pre-CW state", two schemes are permitted (also see Enabling Software or Hardware PTE updates):
Note
|
The tag bit of the stored capability is checked after it is potentially cleared due to lack of permissions. |
-
The same behavior as when CRG is clear, allowing software interpretation of this state.
-
When a capability store or AMO instruction is executed and the tag bit of the capability being written is set, the implementation sets the CW bit and assigns the CRG bit equal to sstatus.CRG.
The PTE update must be atomic with respect to other accesses to the PTE, and must atomically check that the PTE is valid and grants sufficient permissions. Updates to the CW bit and CRG bit must be exact (i.e. not speculative), and observed in program order by the local hart. Furthermore, the PTE update must appear in the global memory order no later than the explicit memory access, or any subsequent explicit memory access to that virtual page by the local hart. The ordering on loads and stores provided by FENCE instructions and the acquire/release bits on atomic instructions also orders the PTE updates associated with those loads and stores as observed by remote harts.
The PTE update is not required to be atomic with respect to the explicit memory access that caused the update, and the sequence is interruptible. However, the hart must not perform explicit memory access before the PTE update is globally visible.
When CW is set, the CRG bit indicates the current generation of the virtual memory page with regards to the ongoing capability revocation cycle. Two schemes are permitted:
-
A load page fault exception is raised when a capability load or AMO instruction is executed with [c_perm] granted and the virtual page’s CRG bit does not equal sstatus.CRG.
-
A load page fault exception is raised when a capability load or AMO instruction is executed with [c_perm] granted and the virtual page’s CRG bit does not equal sstatus.CRG and the capability read from memory optionally has its tag set1.
PTE.CW | PTE.CRG | Load/AMO |
---|---|---|
0 |
X |
Clear loaded tag |
1 |
≠ sstatus.CRG |
Page fault, or page fault if tag is set1 |
1 |
= sstatus.CRG |
Normal operation |
PTE.CW | PTE.CRG | Store/AMO |
---|---|---|
0 |
0 |
Page fault if stored tag is set |
0 |
1 |
Page fault if stored tag is set, or hardware CW and CRG update2 |
1 |
X |
Normal operation |
1 The choice here is whether to take data dependent exceptions on loads or atomic operations. It is legal for the implementation to fault even if the tag is not set since this behavior is only an optimization for software. This means it is also legal to only check the tag under certain conditions and conservatively fault otherwise. Taking a trap when the tag is not set will introduce additional traps during revocation sweeps. Checking the loaded tag affects the exception priority, see [exception-priority].
The exceptions added by {cheri_pte_ext_name} reuse the load page fault and store/AMO page fault exception cause values, and so the cause of the exception can be determined by software by checking the value in [mtval2], [stval2] etc.
TThe behavior when multiple page fault types are detected at once is shown in [mtval2-page-fault].
The decision about whether to take exceptions on capability stores with the tag set to a page with PTE.CW=0 and PTE.CRG=1 is determined by whether the Svade and Svadu extensions are enabled. These cause PTE Accessed and Dirty updates to be done in software, via the exception handler, or by a hardware mechanism respectively.
-
If only Svade is implemented, or enabled through henvcfg.ADUE or menvcfg.ADUE, then take a page fault.
-
If only Svadu is implemented, or enabled through henvcfg.ADUE or menvcfg.ADUE, then do the hardware update of setting PTE.CW=1 and setting PTE.CRG=sstatus.CRG as described in Extending the Page Table Entry Format.
The sstatus and vsstatus CSRs are extended to include the new Capability Read Generation (CRG) bit as shown.
When V=1 vsstatus.CRG is in effect.
Note
|
As there is no M-mode translation available in RISC-V, there is no current software use for mstatus.CRG. It is only included not to break the rule that sstatus is required to be a subset of mstatus. |
{reg: [ {bits: 1, name: 'WPRI'}, {bits: 1, name: 'SIE'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'MIE'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'SPIE'}, {bits: 1, name: 'UBE'}, {bits: 1, name: 'MPIE'}, {bits: 1, name: 'SPP'}, {bits: 2, name: 'VS[1:0]'}, {bits: 2, name: 'MPP[1:0]'}, {bits: 2, name: 'FS[1:0]'}, {bits: 2, name: 'XS[1:0]'}, {bits: 1, name: 'MPRV'}, {bits: 1, name: 'SUM'}, {bits: 1, name: 'MXR'}, {bits: 1, name: 'TVM'}, {bits: 1, name: 'TW'}, {bits: 1, name: 'TSR'}, {bits: 1, name: 'SPELP'}, {bits: 1, name: 'SDT'}, {bits: 7, name: 'WPRI'}, {bits: 2, name: 'UXL[1:0]'}, {bits: 2, name: 'SXL[1:0]'}, {bits: 1, name: 'SBE'}, {bits: 1, name: 'MBE'}, {bits: 1, name: 'GVA'}, {bits: 1, name: 'MPV'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'MPELP'}, {bits: 1, name: 'MDT'}, {bits: 19, name: 'WPRI'}, {bits: 1, name: 'CRG'}, {bits: 1, name: 'SD'}, ], config:{lanes: 4, hspace:1024}}
{reg: [ {bits: 1, name: 'WPRI'}, {bits: 1, name: 'SIE'}, {bits: 3, name: 'WPRI'}, {bits: 1, name: 'SPIE'}, {bits: 1, name: 'UBE'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'SPP'}, {bits: 2, name: 'VS[1:0]'}, {bits: 2, name: 'WPRI'}, {bits: 2, name: 'FS[1:0]'}, {bits: 2, name: 'XS[1:0]'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'SUM'}, {bits: 1, name: 'MXR'}, {bits: 3, name: 'WPRI'}, {bits: 1, name: 'SPELP'}, {bits: 1, name: 'SDT'}, {bits: 7, name: 'WPRI'}, {bits: 2, name: 'UXL[1:0]'}, {bits: 28, name: 'WPRI'}, {bits: 1, name: 'CRG'}, {bits: 1, name: 'SD'}, ], config:{lanes: 4, hspace:1024}}
{reg: [ {bits: 1, name: 'WPRI'}, {bits: 1, name: 'SIE'}, {bits: 3, name: 'WPRI'}, {bits: 1, name: 'SPIE'}, {bits: 1, name: 'UBE'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'SPP'}, {bits: 2, name: 'VS[1:0]'}, {bits: 2, name: 'WPRI'}, {bits: 2, name: 'FS[1:0]'}, {bits: 2, name: 'XS[1:0]'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'MXR'}, {bits: 1, name: 'SUM'}, {bits: 12, name: 'WPRI'}, {bits: 2, name: 'UXL[1:0]'}, {bits: 28, name: 'WPRI'}, {bits: 1, name: 'CRG'}, {bits: 1, name: 'SD'} ], config:{lanes: 4, hspace:1024}}