RISC-V defines variants of the base integer instruction set characterized by the width of the integer registers and the corresponding size of the address space. There are two primary ISA variants, RV32I and RV64I, which provide 32-bit and 64-bit address spaces respectively. The term XLEN refers to the width of an integer register in bits (either 32 or 64). The value of XLEN may change dynamically at run-time depending on the values written to CSRs, so we define capability behavior in terms of MXLEN, which is the value of XLEN used in machine mode and the widest XLEN the implementation supports.
Note
|
{cheri_base_ext_name} assumes a version of the privileged architecture which defines MXLEN as constant and requires higher privilege modes to have at least the same XLEN as lower privilege modes; these changes are present in the current draft and expected to be part of privileged architecture 1.13. |
{cheri_base_ext_name} defines capabilities of size CLEN corresponding to 2 * MXLEN without including the tag bit. The value of CLEN is always calculated based on MXLEN regardless of the effective XLEN value.
Note
|
We briefly note that the capability encoding described in this section could be replaced with an entirely different design without changing how CHERI integrates with the RISC-V ISA. In particular, this capability encoding specification was designed to run software initially ported to CHERIv9 while providing spatial safety, temporal safety and compartmentalization support alongside a good measure of compatibility with RISC-V software that is not aware of CHERI. Alternative capability encoding specifications must provide key primitives, such as permissions and bounds, from this specification while using a different encoding that, for example, changes the granularity of bounds or adds new features. For simplicity of expression, the text is written as if this was the only possible capability encoding for CHERI RISC-V. |
The components of a capability, except the tag, are encoded as shown in Capability encoding for MXLEN=32 for MXLEN=32 and Capability encoding for MXLEN=64 for MXLEN=64. Each memory location or register able to hold a capability must also store the tag as out of band information that software cannot directly set or clear. The capability metadata is held in the most significant bits and the address is held in the least significant bits.
Reserved bits are available for future extensions to {cheri_base_ext_name}.
Note
|
Reserved bits must be 0 in tagged capabilities. |
Note
|
The CL field is only present if {cheri_levels_ext_name} is implemented, otherwise it is reserved. |
Capabilities contain the software accessible fields described in this section.
The tag is an additional hardware managed bit added to addressable memory and registers. It is stored separately and may be referred to as "out of band". It indicates whether a register or CLEN-aligned memory location contains a valid capability. If the tag is set, the capability is valid and can be dereferenced (contingent on checks such as permissions or bounds).
The capability is invalid if the tag is clear. Using an invalid capability to dereference memory or authorize any operation gives rise to exceptions. All capabilities derived from invalid capabilities are themselves invalid i.e. their tags are 0.
All locations in registers or memory able to hold a capability are CLEN+1 bits wide including the tag bit. Those locations are referred as being CLEN-bit or capability wide in this specification.
The byte-address of a memory location is encoded as MXLEN integer value.
MXLEN | Address width |
---|---|
32 |
{cap_rv32_addr_width} |
64 |
{cap_rv64_addr_width} |
This field encodes architecturally defined permissions of the capability. Permissions grant access subject to the tag being set, the capability being unsealed (see Capability Type (CT) Bit), and bounds checks (see Bounds (EF, T, TE, B, BE)). An operation is also contingent on requirements imposed by other RISC-V architectural features, such as virtual memory, PMP and PMAs, even if the capability grants sufficient permissions. The permissions currently defined in {cheri_base_ext_name} are listed below.
- Read Permission (R)
-
Allow reading integer data from memory. Tags are always read as zero when reading integer data.
- Write Permission (W)
-
Allow writing integer data to memory. Tags are always written as zero when writing integer data. Every CLEN aligned word in memory has a tag, if any byte is overwritten with integer data then the tag for all CLEN-bits is cleared.
- Capability Permission (C)
-
Allow reading capability data from memory if the authorizing capability also grants R-permission. Allow writing capability data to memory if the authorizing capability also grants W-permission.
- Execute Permission (X)
-
Allow instruction execution.
- Load Mutable Permission (LM)
-
Allow preserving the W-permission of capabilities loaded from memory. If a capability grants R-permission and C-permission, but no LM-permission, then a capability loaded via this authorizing capability will have W-permission and LM-permission removed provided that the loaded capability has its tag set and is not sealed; loaded capabilities that are sealed or untagged do not have their permissions changed. The rules specified by [ACPERM] are followed when W-permission and LM-permission are removed, so additional permissions may also be removed. Clearing a capability’s LM-permission and W-permission allows sharing a read-only version of a data structure (e.g. a tree or linked list) without making a copy.
Note
|
Implementations are allowed to retain invalid capability permissions loaded from memory instead of following the [ACPERM] behavior of reducing them to no permissions. |
- Access System Registers Permission (ASR)
-
Allow read and write access to all privileged (M-mode and S-mode) CSRs. If {tid_ext_name} is supported the [utid], [utidc], [vstid], [vstidc], [stid], [stidc], [mtid], [mtidc] registers are all considered privileged for the purposes of writing and unprivileged for reading, and thus require ASR-permission for writes but not reads. In all cases a suitable privilege mode is required for access.
The bit width of the permissions field depends on the value of MXLEN as shown in Table 2. A {cap_rv32_perms_width}-bit vector encodes the permissions when MXLEN=32. For this case, the legal encodings of permissions are listed in Table 3. Certain combinations of permissions are impractical. For example, C-permission is superfluous when the capability does not grant either R-permission or W-permission. Therefore, it is only possible to encode a subset of all combinations.
MXLEN | AP field width | Comment |
---|---|---|
32 |
{cap_rv32_perms_width} |
Encodes some combinations of {cap_rv64_perms_width} permission bits, including the [m_bit] if {cheri_default_ext_name} is supported. |
64 |
{cap_rv64_perms_width} |
Separate bits for each architectural permission. |
Note
|
if {cheri_levels_ext_name} is supported then there are {cap_rv64_perms_levels_width} architectural permission bits. |
For MXLEN=32, the permissions encoding is split into four quadrants. The quadrant is taken from bits [4:3] of the permissions encoding. The meaning for bits [2:0] are shown in Encoding of architectural permissions for MXLEN=32 for each quadrant.
Quadrants 2 and 3 are arranged to implicitly grant future permissions which may be added with the existing allocated encodings. Quadrant 0 does the opposite - the encodings are allocated not to implicitly add future permissions, and so granting future permissions will require new encodings. Quadrant 1 encodes permissions for executable capabilities and the [m_bit].
Encoding[2:0] | R | W | C | LM | X | ASR | Mode1 | Notes |
---|---|---|---|---|---|---|---|---|
Quadrant 0: Non-capability data read/write |
||||||||
bit[2] - write, bit[1] - reserved (0), bit[0] - read |
||||||||
Reserved bits for future extensions are 0 so new permissions are not implicitly granted |
||||||||
0 |
N/A |
No permissions |
||||||
1 |
✔ |
N/A |
Data RO |
|||||
2-3 |
reserved |
|||||||
4 |
✔ |
N/A |
Data WO |
|||||
5 |
✔ |
✔ |
N/A |
Data RW |
||||
6-7 |
reserved |
|||||||
Quadrant 1: Executable capabilities |
||||||||
bit[0] - [m_bit] ({CAP_MODE_VALUE}-{cheri_cap_mode_name}, {INT_MODE_VALUE}-{cheri_int_mode_name}) |
||||||||
0-1 |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
Mode1 |
Execute + ASR (see Infinite) |
2-3 |
✔ |
✔ |
✔ |
✔ |
Mode1 |
Execute + Data & Cap RO |
||
4-5 |
✔ |
✔ |
✔ |
✔ |
✔ |
Mode1 |
Execute + Data & Cap RW |
|
6-7 |
✔ |
✔ |
✔ |
Mode1 |
Execute + Data RW |
|||
Quadrant 2: Restricted capability data read/write |
||||||||
R and C implicitly granted, LM dependent on W permission. |
||||||||
Reserved bits for future extensions must be 1 so they are implicitly granted |
||||||||
bit[2] is reserved to mean write for future encodings |
||||||||
0-2 |
reserved |
|||||||
3 |
✔ |
✔ |
N/A |
Data & Cap RO (no LM) |
||||
4-7 |
reserved |
|||||||
Quadrant 3: Capability data read/write |
||||||||
bit[2] - write, R and C implicitly granted. |
||||||||
Reserved bits for future extensions must be 1 so they are implicitly granted |
||||||||
0-2 |
reserved |
|||||||
3 |
✔ |
✔ |
✔ |
N/A |
Data & Cap RO |
|||
4-6 |
reserved |
|||||||
7 |
✔ |
✔ |
✔ |
✔ |
N/A |
Data & Cap RW |
1 Mode ([m_bit]) can only be set on a tagged capability when {cheri_default_ext_name} is supported. Despite being encoded here it is not an architectural permission.
Note
|
When MXLEN=32 there are many reserved permission encodings (see Table 3). It is not possible for a tagged capability to have one of these values since [ACPERM] will never create it. It is possible for untagged capabilities to have reserved values. [GCPERM] will interpret reserved values as if it were 0b00000 (no permissions). Future extensions may assign meanings to the reserved bit patterns, in which case [GCPERM] is allowed to report a non-zero value. |
A {cap_rv64_perms_width}-bit vector encodes the permissions when MXLEN=64 ({cap_rv64_perms_levels_width}-bit if {cheri_levels_ext_name} is supported). In this case, there is a bit per permission as shown in Table 4. A permission is granted if its corresponding bit is set, otherwise the capability does not grant that permission.
Bit | Name |
---|---|
0 |
|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
1 This permission is only supported if the implementation supports {cheri_levels_ext_name}.
The [m_bit] is only assigned meaning when the implementation supports {cheri_default_ext_name} and X-permission is set.
-
For MXLEN=64, the bit assigned to the [m_bit] must be zero if X-permission isn’t set.
-
For MXLEN=32, the [m_bit] is only encoded in quadrant 1 and does not exist in the other quadrants.
Executing [ACPERM] can result in sets of permissions which cannot be represented when MXLEN=32 (see Encoding of architectural permissions for MXLEN=32) or permission combinations which are not useful for MXLEN=64, such as ASR-permission set without X-permission.
These cases are defined to return useful minimal sets of permissions, which may be no permissions. See [ACPERM] for these rules.
Note
|
Future extensions may allow more combinations of permissions, especially for MXLEN=64. |
A bit vector used by the kernel or application programs for software-defined permissions (SDP).
Note
|
Software is completely free to define the usage of these bits. For example, a program may decide to use an SDP bit to indicate the "ownership" of objects. Therefore, a capability grants permission to free the memory it references if that SDP bit is set because it "owns" that object. |
MXLEN | SDPLEN |
---|---|
32 |
{cap_rv32_sdp_width} |
64 |
{cap_rv64_sdp_width} |
This bit indicates the type of the capability: it is a sealed capability if the bit is 1 or unsealed if it is 0.
Sealed capabilities (CT ≠ 0
) cannot be dereferenced to access memory and are immutable
such that modifying any of its fields clears the tag of the output capability.
Note
|
Sealed capabilities might be useful to software as tokens that can be passed around. The only way of clearing the type bit of a capability is by rebuilding it via a superset capability with [CBLD]. {cheri_base_ext_name} does not offer an unseal instruction. |
Note
|
The [section_cap_level] field can be reduced even if the capability is sealed, see [cap_level_load_summary]. |
For code capabilities, the sealing bit is used to implement immutable capabilities that describe function entry points, known as sealed entry (sentry) capabilities. Such capabilities can be leveraged to establish a form of control-flow integrity between mutually distrusting code. A program may jump to a sentry capability to begin executing the instructions it references. A [JALR] instruction with zero offset automatically unseals a sentry target capability and installs it in the program counter capability (see [section_riscv_programmers_model]). The jump-and-link instructions also seal the return address capability which serves as an entry point the callee can return to but cannot use to authorize memory loads or stores.
The bounds encode the base and top addresses that constrain memory accesses. The capability can be used to access any memory location A in the range base ≤ A < top. The bounds are encoded in compressed format, so it is not possible to encode any arbitrary combination of base and top addresses. An invalid capability with tag cleared is produced when attempting to construct a capability that is not representable because its bounds cannot be correctly encoded. The bounds are decoded as described in Capability Encoding.
The bounds field has the following components:
-
T: Value substituted into the capability’s address to decode the top address
-
B: Value substituted into the capability’s address to decode the base address
-
E: Exponent that determines the position at which B and T are substituted into the capability’s address
-
EF: Exponent format flag indicating the encoding for T, B and E
-
The exponent is stored in T and B if EF=0, so it is 'internal'
-
The exponent is zero if EF=1
-
The bit width of T and B are defined in terms of the mantissa width (MW) which is set depending on the value of MXLEN as shown in Table 6.
MXLEN | MW |
---|---|
32 |
{cap_rv32_mw_width} |
64 |
{cap_rv64_mw_width} |
The exponent E indicates the position of T and B within the capability’s address as described in Capability Encoding. The bit width of the exponent (EW) is set depending on the value of MXLEN. The maximum value of the exponent is calculated as follows:
CAP_MAX_E = MXLEN - MW + 2
The possible values for EW and CAP_MAX_E are shown in Table 7.
MXLEN | EW | CAP_MAX_E |
---|---|---|
32 |
{cap_rv32_exp_width} |
24 |
64 |
{cap_rv64_exp_width} |
52 |
Note
|
The address and bounds must be representable in valid capabilities i.e. when the tag is set (see Malformed Capability Bounds). |
The metadata is encoded in a compressed format cite:[woodruff2019cheri]. It uses a floating point representation to encode the bounds relative to the capability address. The base and top addresses from the bounds are decoded as shown below.
Warning
|
TODO: The pseudocode below does not have a formal notation. It is simply a place-holder while the Sail implementation is unavailable. In this notation, / means "integer division", [] are the bit-select operators, and arithmetic is signed. |
EW = (MXLEN == 32) ? 5 : 6
CAP_MAX_E = MXLEN - MW + 2
If EF = 1:
E = 0
T[EW / 2 - 1:0] = TE
B[EW / 2 - 1:0] = BE
LCout = (T[MW - 3:0] < B[MW - 3:0]) ? 1 : 0
LMSB = (MXLEN == 32) ? L8 : 0
else:
E = CAP_MAX_E - ( (MXLEN == 32) ? { L8, TE, BE } : { TE, BE } )
T[EW / 2 - 1:0] = 0
B[EW / 2 - 1:0] = 0
LCout = (T[MW - 3:EW / 2] < B[MW - 3:EW / 2]) ? 1 : 0
LMSB = 1
Reconstituting the top two bits of T:
T[MW - 1:MW - 2] = B[MW - 1:MW - 2] + LCout + LMSB
Decoding the bounds:
top: t = { a[MXLEN - 1:E + MW] + ct, T[MW - 1:0] , {E{1'b0}} }
base: b = { a[MXLEN - 1:E + MW] + cb, B[MW - 1:0] , {E{1'b0}} }
The corrections ct and cb are calculated as as shown below using the definitions in Table 8 and Table 9.
A = a[E + MW - 1:E] R = B - 2MW-2
A < R | T < R | ct |
---|---|---|
false |
false |
0 |
false |
true |
+1 |
true |
false |
-1 |
true |
true |
0 |
A < R | B < R | cb |
---|---|---|
false |
false |
0 |
false |
true |
+1 |
true |
false |
-1 |
true |
true |
0 |
The base, b, and top, t, addresses are derived from the address by substituting a[E + MW - 1:E] with B and T respectively and clearing the lower E bits. The most significant bits of a may be adjusted up or down by 1 using corrections cb and ct to allow encoding memory regions that span alignment boundaries.
The EF bit selects between two cases:
-
EF = 1: The exponent is 0 for regions less than 2MW-2 bytes long. L8 is used to encode the MSB of the length and is added to B along with T[MW-3:0] to form the decoded top.
-
EF = 0: The exponent is internal with E stored in the lower bits of T and B along with L8 when MXLEN=32. E is chosen so that the most significant non-zero bit of the length of the region aligns with T[MW - 2] in the decoded top. Therefore, the most significant two bits of T can be derived from B using the equality
T = B + L
, where L[MW - 2] is known from the values of EF and E and a carry out is implied ifT[MW - 3:0] < B[MW - 3:0]
since it is guaranteed that the top is larger than the base.
The compressed bounds encoding allows the address to roam over a large representable region while maintaining the original bounds. This is enabled by defining a lower boundary R from the out-of-bounds values that allows us to disambiguate the location of the bounds with respect to an out-of-bounds address. R is calculated relative to the base by subtracting 2MW-2 from B. If B, T or a[E + MW - 1:E] is less than R, it is inferred that they lie in the 2E+MW aligned region above R labeled spaceU in Figure 1 and the corrections ct and cb are computed accordingly. The overall effect is that the address can roam 2E+MW/4 bytes below the base address and at least 2E+MW/4 bytes above the top address while still allowing the bounds to be correctly decoded.
A capability has infinite bounds if its bounds cover the entire address space such that the base address b=0 and the top address t≥2MXLEN, i.e. t is an MXLEN + 1 bit value. However, b is an MXLEN bit value and the size mismatch introduces additional complications when decoding, so the following condition is required to correct t for capabilities whose Representable Range wraps the edge of the address space:
if ( (E < (CAP_MAX_E - 1)) & (t[MXLEN: MXLEN - 1] - b[MXLEN - 1] > 1) )
t[MXLEN] = !t[MXLEN]
That is, invert the most significant bit of t if the decoded length of the capability is larger than E.
Note
|
A capability has infinite bounds if E=CAP_MAX_E and it is not malformed (see Malformed Capability Bounds); this check is equivalent to b=0 and t≥2MXLEN. |
A capability is malformed if its bounds cannot be correctly decoded.
The following check
indicates whether a capability is malformed. enableL8
is true when MXLEN=32
and false otherwise, indicating whether the L8
bit is available for extra
precision when EF=1
.
malformedMSB = (E == CAP_MAX_E && B != 0)
|| (E == CAP_MAX_E - 1 && B[MW - 1] != 0)
malformedLSB = (E < 0) || (E == 0 && enableL8)
malformed = !EF && (malformedMSB || malformedLSB)
Note
|
The check is for malformed bounds, so it does not include reserved bits! |
CHERI enforces the following invariants for all valid (i.e., tagged) capabilities:
-
The bounds are not malformed.
-
No reserved bit in the capability encoding is set.
-
The permissions can be legally produced by [ACPERM].
A tagged capability that violates those invariants (i.e., a tagged but malformed capability or a tagged capability with any reserved bit set) can only possibly be caused by a logic or memory fault (e.g., bit flipping).
Capabilities with malformed bounds:
See specific instruction pages for full details of the effect of malformed capabilities.
The NULL capability is represented with 0 in all fields. This implies that it has no permissions and its exponent E is CAP_MAX_E (52 for MXLEN=64, 24 for MXLEN=32), so its bounds cover the entire address space such that the expanded base is 0 and top is 2MXLEN.
Field | Value | Comment |
---|---|---|
Tag |
zero |
Capability is not valid |
SDP |
zeros |
Grants no permissions |
AP |
zeros |
Grants no permissions |
M |
zero |
No meaning since non-executable (MXLEN=64 only) |
CL |
zero1 |
Local |
CT |
zero |
Unsealed |
EF |
zero |
Internal exponent format |
L8 |
zero |
Top address reconstruction bit (MXLEN=32 only) |
T |
zeros |
Top address bits |
TE |
zeros |
Exponent bits |
B |
zeros |
Base address bits |
BE |
zeros |
Exponent bits |
Address |
zeros |
Capability address |
Reserved |
zeros |
All reserved fields |
1 This field only exists if {cheri_levels_ext_name} is implemented.
The Infinite capability grants all permissions while its bounds also cover the whole address space. It includes X-permission and so includes the [m_bit] if {cheri_default_ext_name} is supported.
Note
|
The Infinite capability is also known as 'default', 'almighty', or 'root' capability. |
Field | Value | Comment |
---|---|---|
Tag |
one |
Capability is valid |
SDP |
ones |
Grants all permissions |
AP (MXLEN=32) |
0x8/0x91 (see Table 3) |
Grants all permissions |
AP (MXLEN=64) |
0xFF (see Table 4) |
Grants all permissions |
CL |
one2 |
Global |
CT |
zero |
Unsealed |
EF |
zero |
Internal exponent format |
L8 |
zero |
Top address reconstruction bit (MXLEN=32 only) |
T |
zeros |
Top address bits |
TE |
zeros |
Exponent bits |
B |
zeros |
Base address bits |
BE |
zeros |
Exponent bits |
Address |
zeros |
Capability address |
Reserved |
zeros |
All reserved fields |
1If {cheri_default_ext_name} is supported, then the Infinite capability must represent {cheri_int_mode_name} for compatibility with standard RISC-V code. Therefore:
2 This field only exists if {cheri_levels_ext_name} is implemented.
The new address, after updating the address of a capability, is within the representable range if decompressing the capability’s bounds with the original and new addresses yields the same base and top addresses.
In other words, given a capability with address a and the
new address a' = a + x
, the bounds b and t are decoded using a and the
new bounds b' and t' are decoded using a'. The new address is within the
capability’s representable range if b == b' && t == t'
.
Changing a capability’s address to a value outside the representable range unconditionally clears the capability’s tag. Examples are:
In the bounds encoding in this specification, the top and bottom capability bounds are formed of two or three sections:
-
Upper bits from the address
-
This is only if the other sections do not fill the available bits (E + MW ≤ MXLEN)
-
-
Middle bits from T and B decoded from the metadata
-
Lower bits are set to zero
-
This is only if there is an internal exponent (EF=0)
-
Configuration | Upper Section (if E + MW ≤ MXLEN) | Middle Section | Lower Section |
---|---|---|---|
EF=0 |
address[MXLEN:E + MW] + ct |
T[MW - 1:0] |
{E{1’b0}} |
EF=1, i.e. E=0 |
address[MXLEN:MW] + ct |
T[MW - 1:0] |
The top described by Table 12 is MXLEN+1 bits wide to allow capabilities to span the whole address space. The address is zero-extended by one bit. The malformed check (see Malformed Capability Bounds) ensures that the top never overflows into MXLEN+2 bits and that the base never overflows into MXLEN+1 bits.
The representable range defines the range of addresses which do not corrupt the bounds encoding. The encoding was first introduced in Capability Encoding, and is repeated in a different form in Table 12 to aid this description.
For the address to be valid for the current bounds encoding, the value in the Upper Section of Table 12 must not change as this will change the meaning of the bounds.
This gives a range of s=2E+MW
, as shown in
Figure 1.
The gap between the object bounds and the bound of the representable range
is always guaranteed
to be at least 1/4 of s
. This is represented by R = B - 2MW-2
in
Capability Encoding.
This gives useful guarantees, such that if an executed instruction is in
[pcc] bounds, then it is also guaranteed that the next linear instruction
is representable.