fix: address feedback from @JulienMalka

drupol · drupol · commit 5f2c820400d4 · 2025-04-10T08:25:43.000+02:00
diff --git a/resources/typst/configuration-management-summary.typ b/resources/typst/configuration-management-summary.typ
@@ -18,25 +18,23 @@
   [
     - Shell commands
   ],
-  [
-    - Shell scripts
-  ],
+  [],
   table.cell(align: horizon + center)[*Convergent*],
   table.hline(stroke: .5pt),
   [
-    - Ansible #cite(<ansible>,form:"normal")
-    - Chef #cite(<chef>,form:"normal")
-    - Docker #cite(<docker>,form:"normal")
+    - Ansible #cite(<ansible>, form: "normal")
+    - Chef #cite(<chef>, form: "normal")
+    - Docker #cite(<docker>, form: "normal")
   ],
   [
-    - Puppet #cite(<puppet>,form:"normal")
-    - Terraform #cite(<terraform>,form:"normal")
+    - Puppet #cite(<puppet>, form: "normal")
+    - Terraform #cite(<terraform>, form: "normal")
   ],
   table.hline(stroke: .5pt),
   table.cell(align: horizon + center)[*Congruent*],
   [],
   [
-    - Guix #cite(<guixwebsite>,form:"normal")
-    - Nix #cite(<nix>,form:"normal")
+    - Guix #cite(<guixwebsite>, form: "normal")
+    - Nix #cite(<nix>, form: "normal")
   ],
 )
diff --git a/src/thesis/2-reproducibility.typ b/src/thesis/2-reproducibility.typ
@@ -264,7 +264,7 @@ reproducibility in #gls("CS").
   #emph[Space] and #emph[Time] are terms borrowed from physics. In the context
   of reproducibility in #gls("SE"), space refers to different systems, while
   time refers to different moments in time
-  #cite(<malka-hal-04430009>,form:"normal"). (more about that in
+  #cite(<malka-hal-04430009>, form: "normal"). (more about that in
   @def-deterministic-build).
 ]
 
@@ -458,7 +458,7 @@ We will explore this concept with Docker images as a primary example. Docker, a
 popular containerization platform, uses Dockerfiles (@dockerfile-example).
 Basically, a `Dockerfile` is a script with a set of instructions to build
 images. These images are then used to run software in a consistent environment.
-However, images on Docker Hub #cite(<dockerhub>, form:"normal") often present
+However, images on Docker Hub #cite(<dockerhub>, form: "normal") often present
 challenges to reproducibility. The reasons vary: some Dockerfiles are not
 publicly available but especially because most of them include significant
 variability in their build processes, making exact replication of the images
@@ -716,7 +716,7 @@ which it will be evaluated, effectively making, to some extent, this environment
 an extra input parameter per se. This computational environment, which
 encompasses the hardware #eg[filesystem, memory, #gls("CPU", long: false)],
 software #eg[#gls("OS", long: false)] and date #eg[the current date and
-time], may influence the function's behaviour and output. Consequently,
+  time], may influence the function's behaviour and output. Consequently,
 functions in #gls("CS") are inherently designed to interact with and adapt to
 their environment, thereby making them dynamic and versatile but also
 potentially non-deterministic.
@@ -770,7 +770,7 @@ reflects the state of its transitive inputs. Basically, the output represents
 all direct and indirect dependencies used in the build process.
 "Transitive inputs" refer to not only the direct inputs #eg[source code] but
 also to the inputs of those inputs #eg[libraries, frameworks, compilers, data
-resources].
+  resources].
 
 From the point of view of the software build process as shown in
 @inputs-outputs-part1, the inputs are all the source code files, configuration
@@ -1040,7 +1040,7 @@ produce the same hash, an occurrence known as a #emph[collision]. The ability to
 find collisions undermines the security of the algorithm. There are different
 types of algorithms to calculate a checksum
 #eg[#gls("MD5", long: false), #gls("SHA1", long: false),
-#gls("SHA2", long: false)]. Older algorithms like #gls("MD5", long: false) have
+  #gls("SHA2", long: false)]. Older algorithms like #gls("MD5", long: false) have
 known vulnerabilities that allow collision attacks while more modern algorithms
 like SHA-256 (#gls("SHA2", long: false)) are currently considered to be pretty
 much impossible to crack.
@@ -1055,7 +1055,7 @@ despite the theoretical potential for identical hashes of different inputs.
 #info-box(kind: "info")[
   Choosing an appropriate checksum algorithm is paramount due to the rapid
   evolution of computational power as described by Moore's Law
-  #cite(<4785860>,form:"normal"), which leads to previously secure algorithms
+  #cite(<4785860>, form: "normal"), which leads to previously secure algorithms
   becoming vulnerable as computing capabilities expand.
 
   For instance, #gls("MD5") checksums, once deemed secure for storing passwords,
@@ -1155,7 +1155,7 @@ The process of controlling the computational environment $E$ underscores a
 fundamental challenge in #gls("SE"): achieving reproducibility through
 environment standardisation. The environment includes specific factors such as
 hardware and software configurations #eg[#gls("CPU"), #gls("OS"), library
-versions, and runtime conditions] that directly affect a function's behaviour
+  versions, and runtime conditions] that directly affect a function's behaviour
 and output. The Monte Carlo simulation algorithm (@montecarlo-pi.c), exemplifies
 this challenge: it may be reproducible at build time but can exhibit variance at
 run time due to environmental factors.
@@ -1237,7 +1237,7 @@ primarily in their focus, structure, and community support. The choice between
 specific needs, whether the focus is on extensive licensing compliance or
 streamlined security and risk management within the software supply chain.
 
-The #gls("CRA") #cite(<CRA>, form:"normal") mandates the incorporation of a
+The #gls("CRA") #cite(<CRA>, form: "normal") mandates the incorporation of a
 #gls("SBOM") in software products, highlighting its important role in bolstering
 software security and transparency. This requirement marks a significant
 advancement in enhancing the integrity and security of software, ensuring that
@@ -1246,10 +1246,10 @@ lifecycle. While the #gls("CRA") includes multiple measures, most will take
 effect three years after its passage, likely in early 2027. Specifically,
 regarding #gls("SBOM"), the following applies to products with digital elements
 available: #quote[identify and document vulnerabilities and components contained
-in products with digital elements, including by drawing up a software bill of
-materials in a commonly used and machine-readable format covering at the very
-least the top-level dependencies of the products]
-#cite(<CRA>, supplement: "Annex I, Part II (1)", form:"normal").
+  in products with digital elements, including by drawing up a software bill of
+  materials in a commonly used and machine-readable format covering at the very
+  least the top-level dependencies of the products]
+#cite(<CRA>, supplement: "Annex I, Part II (1)", form: "normal").
 
 ==== Supply Chain <ch2-supply-chain>
 
@@ -1269,7 +1269,7 @@ direct and indirect dependencies, adding complexity to the software supply
 chain. The build environments, which encompass all necessary components and
 their precise versions for software compilation, become intricate and difficult
 to replicate across different systems and over time. This growing complexity,
-"politely called #emph[dependency management]" #cite(<8509170>, form:"normal")
+"politely called #emph[dependency management]" #cite(<8509170>, form: "normal")
 but more colloquially known as #emph[dependency hell], is a phenomenon that
 developers have become all too familiar with. While Semantic Versioning
 (@package-managers) offers a strategy to mitigate these issues, it alone is
@@ -1381,7 +1381,7 @@ frequently unattainable in practice.
 
 One of the primary impediments in achieving reproducibility lies in the
 dependency on hardware architecture. Software compiled for different
-architectures, such as `x86` and `ARM,` inherently produces disparate binaries #cite(<patterson2013>,form:"normal").
+architectures, such as `x86` and `ARM,` inherently produces disparate binaries #cite(<patterson2013>, form: "normal").
 These differences stem from the instruction sets and optimizations that are
 specific to each platform, leading to divergent outputs despite using identical
 source code. This variance highlights a significant reproducibility challenge,
@@ -1412,7 +1412,7 @@ entirely achievable, we will delve deeper into these challenges by exploring the
 impact of non-deterministic compilers and the strategies to mitigate these
 challenges using different methods.
 
-== Deterministic Builds And Environments
+== Deterministic Builds And Environments <ch2-deterministic-builds-and-environments>
 
 In this section, we will explore the concept of deterministic builds, and the
 potential sources of non-determinism in software builds.
@@ -1470,9 +1470,9 @@ and at any point in the past or future​​​​.
 
 Reproducibility relies on stable, consistent and well-maintained codebases but
 also heavily depends on stable, consistent and well-maintained environments as
-seen in (add ref to ch2-environments). In addition, a critical component is
+seen in #ref(<ch2-deterministic-builds-and-environments>). In addition, a critical component is
 environment configuration management. Configuration management plays a critical
-role inensuring reproducibility by mitigating the non-deterministic behaviours
+role in ensuring reproducibility by mitigating the non-deterministic behaviours
 introduced by configuration drifts.
 
 #info-box[
@@ -1524,9 +1524,9 @@ if not impossible. Environments may progressively "converge" towards a common
 state, but subtle differences can persist, introducing variability. To
 illustrate this model, we could think of an arbitrary environment that needs to
 be configured in a specific way, reach a particular well known state. For
-example, some specific dependencies has to be installed to run a particular
+example, some specific dependencies have to be installed to run a particular
 service. Tools like Puppet #cite(<puppet>, form: "normal"), Chef
-#cite(<chef>, form: "normal"), Terraform #cite(<terraform>,form: "normal")
+#cite(<chef>, form: "normal"), Terraform #cite(<terraform>, form: "normal")
 and Ansible #cite(<ansible>, form: "normal") might help to achieve this goal.
 
 While convergent management offers flexibility in responding to unforeseen
@@ -1570,7 +1570,7 @@ approaches based on the environment's needs.
 #info-box[
   Immutable environments ((add ref to ch2-environments)) are environments that are designed
   to be unchangeable once they are created. They are often used in containers
-  #eg[Docker #cite(<docker>,form:"normal")], where the ability to quickly create
+  #eg[Docker #cite(<docker>, form: "normal")], where the ability to quickly create
   and destroy environments is essential. Immutable environments enhance
   reproducibility and reliability, making them an ideal choice for environments
   that require high levels of predictability and stability.
@@ -1623,15 +1623,19 @@ configuration can be applied multiple times without altering the environment
 beyond its intended state. This abstraction simplifies understanding and
 maintenance by allowing the system to determine the necessary actions to achieve
 the desired state. Tools such as Puppet #cite(<puppet>, form: "normal"),
-Kubernetes #cite(<kubernetes>,form: "normal"),
-Terraform #cite(<terraform>,form: "normal") and, under some conditions,
+Kubernetes #cite(<kubernetes>, form: "normal"),
+Terraform #cite(<terraform>, form: "normal") and, under some conditions,
 Docker #cite(<docker>, form: "normal") are used to specify the desired end
 state. These tools typically feature their own specific #gls("DSL") to create
 high-level descriptions of the expected environment's state, as opposed to
 issuing imperative and procedural commands. The declarative approach mitigates
 the risk of configuration drift by prioritising idempotence, maintaining
 explicit dependency graphs, and ensuring a strong awareness of the current state
-of the environment​​ #cite(<HunterGCP>,form:"normal", supplement: [p. 348]).
+of the environment​​ #cite(<HunterGCP>, form: "normal", supplement: [p. 348]).
+
+// This doesn't really convince me, you classify a shell command as
+// "declarative" when they are in shell scripts, which is not much different
+// to me as a Dockerfile
 
 #info-box(kind: "note")[
   In @ch2-table-configuration-mgmt, Docker #cite(<docker>, form: "normal") and
@@ -1871,7 +1875,7 @@ Often, timestamps are used to approximate which version of the source were
 built. Since file timestamps are volatile, the source code needs to be tracked
 more accurately than just a timestamp. Just like for version information, the
 solution would be to extract the date from a dedicated file like a changelog, or
-a specific commit #cite(<nixpkgs-pull-256270> ,form: "normal").
+a specific commit #cite(<nixpkgs-pull-256270>, form: "normal").
 
 To circumvent this issue, `SOURCE_DATE_EPOCH` is a specific environment variable
 convention for pinning timestamps to a specific value that has been introduced
@@ -1913,7 +1917,7 @@ especially when those builds are not identical. This section introduces a tool
 designed specifically for this purpose.
 
 Developed under the umbrella of the @ReproducibleBuildsOrg effort, `diffoscope`
-#cite(<diffoscope>, form:"normal") is a comprehensive, open-source tool that
+#cite(<diffoscope>, form: "normal") is a comprehensive, open-source tool that
 excels in comparing files and directories. Its unique capability to recursively
 unpack archives of various types and transform binary formats into a
 human-readable form makes it an indispensable tool for software comparison. It
diff --git a/src/thesis/imports/preamble.typ b/src/thesis/imports/preamble.typ
@@ -5,5 +5,6 @@
 #import "@preview/xarrow:0.3.1": xarrow, xarrowSquiggly, xarrowTwoHead
 #import "@preview/hydra:0.6.1": *
 #import "@preview/cetz:0.3.4"
+#import "@preview/cetz-plot:0.1.1": *
 #import "colors.typ": *
 #import "workarounds.typ": *