Add a quick-start guide (#404)

PeterRugg · web-flow · commit e36d2a8bee3e · 2024-10-04T14:28:24.000+01:00
First cut at a quick start guide (with help from @gameboo). Fixes #339
diff --git a/src/riscv-cheri.adoc b/src/riscv-cheri.adoc
@@ -36,6 +36,8 @@ include::contributors.adoc[]
 // Document body
 ///////////////////////////////////////////////////////////////////////////////
 
+include::summary.adoc[]
+
 include::introduction.adoc[]
 
 include::cap-description.adoc[]
diff --git a/src/riscv-cheri.bib b/src/riscv-cheri.bib
@@ -73,3 +73,11 @@ @misc{riscv-v-spec
   note = {Version 1.0},
   url = {https://github.com/riscv/riscv-v-spec/releases/download/v1.0/riscv-v-spec-1.0.pdf}
 }
+
+@misc{msrc-cheri-eval,
+  author = {Joly, Nicolas and ElSherei, Saif and Amar, Saar},
+  title = {Security analysis of CHERI ISA},
+  url={https://github.com/microsoft/MSRC-Security-Research/blob/master/papers/2020/Security%20analysis%20of%20CHERI%20ISA.pdf},
+  year= {2020},
+  month={10}
+}
diff --git a/src/summary.adoc b/src/summary.adoc
@@ -0,0 +1,84 @@
+== Quick Start
+
+This document describes the RISC-V extensions for supporting CHERI capabilities in hardware.
+Capabilities can be used to provide memory safety, mitigating up to 70% of memory safety issues cite:[msrc-cheri-eval], as well as to provide efficient compartmentalisation.
+The extensions are split into the core features required for a working capability system ({cheri_base_ext_name}), and features required to support a mix-and-match of binaries compiled for CHERI and unchanged binaries ({cheri_default_ext_name}).
+Some other smaller extensions are described that provide additional functionality relevant to CHERI.
+
+=== Capability Properties
+
+Capabilities are 2*XLEN (which we call CLEN) bit structures, containing all the information required to identify and authorise access to a region of memory.
+This includes:
+
+ * An XLEN bit address, describing where the capability currently points.
+
+ * Bounds: a _base_ and a _top_ address, describing the range of addresses the capability can be used to access.
+
+ * Permissions (read, write, execute, read capability, ...) describing the kinds of accesses the capability can be used for.
+
+ * Sealing information: a capability can be _sealed_, restricting it to only be used or modified in particular ways.
+
+A one-bit integrity tag is stored alongside a capability: this is maintained by hardware and cannot be directly modified by software.
+It indicates whether the capability is valid.
+An initial <<infinite-cap>> capability with access to all of memory with all permissions is provided in system registers on reset: all valid capabilities are derived from it.
+This is the only way to obtain a valid capability: no software, even machine mode, can _forge_ a capability.
+
+=== Added State
+
+A CHERI core adds state to allow capabilities to be used from within registers, and to ensure they are not corrupted as they flow through the system.
+This means the following state is added:
+
+* Metadata within architectural registers: XLEN-wide integer registers (e.g. `sp`, `a0`) are all extended with another XLEN bits of capability metadata, including bounds and permissions.
+  The resulting CLEN bits in full form a capability, and we refer to the same register prefixed with a `c`, i.e. `csp`, `ca0`.
+  The integer part of the register is interpreted as the address field of the capability.
+  The zero register is extended with zero metadata and a cleared tag: this is called the <<null-cap>> capability.
+  As well as general purpose registers, system registers that store addresses are extended to contain capabilities.
+  For example, <<mtvec>> is extended to a capability version <<mtvecc>> (the machine trap vector capability) to allow the code bounds to be changed on an exception.
+
+* Tags in registers, caches, and memory:
+
+** Every register has a one-bit tag, indicating whether the capability in the register is valid to be dereferenced.
+  This tag is cleared if the register is written as an integer.
+
+** The tags are also tracked through the memory subsystem: every aligned CLEN-bits wide region has a non-addressable one-bit tag, which the hardware manages atomically with the data.
+   The tag is cleared if the memory region is ever written other than using a capability store from a tagged capability register.
+   Any caches must preserve this abstraction.
+
+=== Checking Memory
+
+Every memory access performed by a CHERI core must be authorised by a capability.
+It is explicitly defined for every instruction where to find the capability to check against.
+In _purecap_ code, where all pointers are individual capabilities, the capability and address are used together, so e.g. `lw t0, 16(csp)` loads a word from memory, getting the address and bounds from the `csp` register.
+For code that has not yet been fully adapted to CHERI (_hybrid_ code), the processor can run in a pointer mode (not to be confused with a privilege mode) where the authorising capability is instead taken from a special CSR: the default data capability (<<ddc>>).
+
+Instruction fetch is also authorised by a capability: the program counter capability (<<pcc>>) which extends PC.
+This allows code fetch to be bounded, preventing a wide range of attacks that subvert control flow with integer data.
+Where {cheri_default_ext_name} is supported, the <<pcc>> also contains the <<m_bit,mode bit>> indicating whether the processor is running in integer or capability pointer mode.
+Changing the bounds used for instruction fetch or the pointer mode can be as easy as performing a capability-based jump (<<JALR>> in capability pointer mode).
+A <<MODESW>> instruction and compressed version is also added to allow cheap mode switching.
+
+Exception codes are added for CHERI-specific exceptions on fetch, jumps, and memory access.
+No other exception paths are added: in particular, capability manipulations do not trap, but may clear the tag on the result capability if the operation is not permitted.
+
+=== Added Instructions
+
+The added instructions can be split into the following categories:
+
+* Capability manipulations (e.g. <<CADD>>, <<SCBNDS>>): for security, capabilities can only be modified in restricted ways.
+  Special instructions are provided to perform these allowed operations, for example _shrinking_ the bounds or _reducing_ the permissions.
+  Any attempt to manipulate capabilities without using the instructions clears the tag, rendering them unusable for accessing memory.
+
+* Capability inspection (e.g. <<GCBASE>>, <<GCPERM>>): capability fields (for example the _bounds_ describing what addresses the capability gives access to) are stored compressed in registers and memory.
+  These instructions give convenient access to allow software to query them.
+
+* Memory access instructions (e.g. <<LC>>, <<SC>>): capabilities must be read from and written to memory atomically along with their tag.
+  Instructions are added to perform these wider accesses, allowing capability flow between the memory and the register file.
+
+=== Existing Instructions
+
+Existing RISC-V instructions are largely unmodified: in {cheri_int_mode_name}, there is binary compatibility.
+Instructions that access memory, as well as branches and jumps, are automatically checked against <<ddc>> and <<pcc>>, raising an exception if the checks fail.
+However, <<ddc>> and <<pcc>> are reset to <<infinite-cap>> capabilities, meaning the checks will always pass on systems that have not written to CHERI system registers.
+
+In {cheri_cap_mode_name}, these instructions are instead modified to check against the full capability from the address register (e.g. `lw t0, 16(csp)`).
+In some cases, they are also changed to return a full capability value, e.g. <<AUIPC>> will return the full <<pcc>> including the metadata.