Skip to content

Commit 5f2c820

Browse files
committed
fix: address feedback from @JulienMalka
1 parent f1234e4 commit 5f2c820

File tree

3 files changed

+38
-35
lines changed

3 files changed

+38
-35
lines changed

resources/typst/configuration-management-summary.typ

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,25 +18,23 @@
1818
[
1919
- Shell commands
2020
],
21-
[
22-
- Shell scripts
23-
],
21+
[],
2422
table.cell(align: horizon + center)[*Convergent*],
2523
table.hline(stroke: .5pt),
2624
[
27-
- Ansible #cite(<ansible>,form:"normal")
28-
- Chef #cite(<chef>,form:"normal")
29-
- Docker #cite(<docker>,form:"normal")
25+
- Ansible #cite(<ansible>, form: "normal")
26+
- Chef #cite(<chef>, form: "normal")
27+
- Docker #cite(<docker>, form: "normal")
3028
],
3129
[
32-
- Puppet #cite(<puppet>,form:"normal")
33-
- Terraform #cite(<terraform>,form:"normal")
30+
- Puppet #cite(<puppet>, form: "normal")
31+
- Terraform #cite(<terraform>, form: "normal")
3432
],
3533
table.hline(stroke: .5pt),
3634
table.cell(align: horizon + center)[*Congruent*],
3735
[],
3836
[
39-
- Guix #cite(<guixwebsite>,form:"normal")
40-
- Nix #cite(<nix>,form:"normal")
37+
- Guix #cite(<guixwebsite>, form: "normal")
38+
- Nix #cite(<nix>, form: "normal")
4139
],
4240
)

src/thesis/2-reproducibility.typ

Lines changed: 29 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ reproducibility in #gls("CS").
264264
#emph[Space] and #emph[Time] are terms borrowed from physics. In the context
265265
of reproducibility in #gls("SE"), space refers to different systems, while
266266
time refers to different moments in time
267-
#cite(<malka-hal-04430009>,form:"normal"). (more about that in
267+
#cite(<malka-hal-04430009>, form: "normal"). (more about that in
268268
@def-deterministic-build).
269269
]
270270

@@ -458,7 +458,7 @@ We will explore this concept with Docker images as a primary example. Docker, a
458458
popular containerization platform, uses Dockerfiles (@dockerfile-example).
459459
Basically, a `Dockerfile` is a script with a set of instructions to build
460460
images. These images are then used to run software in a consistent environment.
461-
However, images on Docker Hub #cite(<dockerhub>, form:"normal") often present
461+
However, images on Docker Hub #cite(<dockerhub>, form: "normal") often present
462462
challenges to reproducibility. The reasons vary: some Dockerfiles are not
463463
publicly available but especially because most of them include significant
464464
variability in their build processes, making exact replication of the images
@@ -716,7 +716,7 @@ which it will be evaluated, effectively making, to some extent, this environment
716716
an extra input parameter per se. This computational environment, which
717717
encompasses the hardware #eg[filesystem, memory, #gls("CPU", long: false)],
718718
software #eg[#gls("OS", long: false)] and date #eg[the current date and
719-
time], may influence the function's behaviour and output. Consequently,
719+
time], may influence the function's behaviour and output. Consequently,
720720
functions in #gls("CS") are inherently designed to interact with and adapt to
721721
their environment, thereby making them dynamic and versatile but also
722722
potentially non-deterministic.
@@ -770,7 +770,7 @@ reflects the state of its transitive inputs. Basically, the output represents
770770
all direct and indirect dependencies used in the build process.
771771
"Transitive inputs" refer to not only the direct inputs #eg[source code] but
772772
also to the inputs of those inputs #eg[libraries, frameworks, compilers, data
773-
resources].
773+
resources].
774774

775775
From the point of view of the software build process as shown in
776776
@inputs-outputs-part1, the inputs are all the source code files, configuration
@@ -1040,7 +1040,7 @@ produce the same hash, an occurrence known as a #emph[collision]. The ability to
10401040
find collisions undermines the security of the algorithm. There are different
10411041
types of algorithms to calculate a checksum
10421042
#eg[#gls("MD5", long: false), #gls("SHA1", long: false),
1043-
#gls("SHA2", long: false)]. Older algorithms like #gls("MD5", long: false) have
1043+
#gls("SHA2", long: false)]. Older algorithms like #gls("MD5", long: false) have
10441044
known vulnerabilities that allow collision attacks while more modern algorithms
10451045
like SHA-256 (#gls("SHA2", long: false)) are currently considered to be pretty
10461046
much impossible to crack.
@@ -1055,7 +1055,7 @@ despite the theoretical potential for identical hashes of different inputs.
10551055
#info-box(kind: "info")[
10561056
Choosing an appropriate checksum algorithm is paramount due to the rapid
10571057
evolution of computational power as described by Moore's Law
1058-
#cite(<4785860>,form:"normal"), which leads to previously secure algorithms
1058+
#cite(<4785860>, form: "normal"), which leads to previously secure algorithms
10591059
becoming vulnerable as computing capabilities expand.
10601060

10611061
For instance, #gls("MD5") checksums, once deemed secure for storing passwords,
@@ -1155,7 +1155,7 @@ The process of controlling the computational environment $E$ underscores a
11551155
fundamental challenge in #gls("SE"): achieving reproducibility through
11561156
environment standardisation. The environment includes specific factors such as
11571157
hardware and software configurations #eg[#gls("CPU"), #gls("OS"), library
1158-
versions, and runtime conditions] that directly affect a function's behaviour
1158+
versions, and runtime conditions] that directly affect a function's behaviour
11591159
and output. The Monte Carlo simulation algorithm (@montecarlo-pi.c), exemplifies
11601160
this challenge: it may be reproducible at build time but can exhibit variance at
11611161
run time due to environmental factors.
@@ -1237,7 +1237,7 @@ primarily in their focus, structure, and community support. The choice between
12371237
specific needs, whether the focus is on extensive licensing compliance or
12381238
streamlined security and risk management within the software supply chain.
12391239

1240-
The #gls("CRA") #cite(<CRA>, form:"normal") mandates the incorporation of a
1240+
The #gls("CRA") #cite(<CRA>, form: "normal") mandates the incorporation of a
12411241
#gls("SBOM") in software products, highlighting its important role in bolstering
12421242
software security and transparency. This requirement marks a significant
12431243
advancement in enhancing the integrity and security of software, ensuring that
@@ -1246,10 +1246,10 @@ lifecycle. While the #gls("CRA") includes multiple measures, most will take
12461246
effect three years after its passage, likely in early 2027. Specifically,
12471247
regarding #gls("SBOM"), the following applies to products with digital elements
12481248
available: #quote[identify and document vulnerabilities and components contained
1249-
in products with digital elements, including by drawing up a software bill of
1250-
materials in a commonly used and machine-readable format covering at the very
1251-
least the top-level dependencies of the products]
1252-
#cite(<CRA>, supplement: "Annex I, Part II (1)", form:"normal").
1249+
in products with digital elements, including by drawing up a software bill of
1250+
materials in a commonly used and machine-readable format covering at the very
1251+
least the top-level dependencies of the products]
1252+
#cite(<CRA>, supplement: "Annex I, Part II (1)", form: "normal").
12531253

12541254
==== Supply Chain <ch2-supply-chain>
12551255

@@ -1269,7 +1269,7 @@ direct and indirect dependencies, adding complexity to the software supply
12691269
chain. The build environments, which encompass all necessary components and
12701270
their precise versions for software compilation, become intricate and difficult
12711271
to replicate across different systems and over time. This growing complexity,
1272-
"politely called #emph[dependency management]" #cite(<8509170>, form:"normal")
1272+
"politely called #emph[dependency management]" #cite(<8509170>, form: "normal")
12731273
but more colloquially known as #emph[dependency hell], is a phenomenon that
12741274
developers have become all too familiar with. While Semantic Versioning
12751275
(@package-managers) offers a strategy to mitigate these issues, it alone is
@@ -1381,7 +1381,7 @@ frequently unattainable in practice.
13811381

13821382
One of the primary impediments in achieving reproducibility lies in the
13831383
dependency on hardware architecture. Software compiled for different
1384-
architectures, such as `x86` and `ARM,` inherently produces disparate binaries #cite(<patterson2013>,form:"normal").
1384+
architectures, such as `x86` and `ARM,` inherently produces disparate binaries #cite(<patterson2013>, form: "normal").
13851385
These differences stem from the instruction sets and optimizations that are
13861386
specific to each platform, leading to divergent outputs despite using identical
13871387
source code. This variance highlights a significant reproducibility challenge,
@@ -1412,7 +1412,7 @@ entirely achievable, we will delve deeper into these challenges by exploring the
14121412
impact of non-deterministic compilers and the strategies to mitigate these
14131413
challenges using different methods.
14141414

1415-
== Deterministic Builds And Environments
1415+
== Deterministic Builds And Environments <ch2-deterministic-builds-and-environments>
14161416

14171417
In this section, we will explore the concept of deterministic builds, and the
14181418
potential sources of non-determinism in software builds.
@@ -1470,9 +1470,9 @@ and at any point in the past or future​​​​.
14701470

14711471
Reproducibility relies on stable, consistent and well-maintained codebases but
14721472
also heavily depends on stable, consistent and well-maintained environments as
1473-
seen in (add ref to ch2-environments). In addition, a critical component is
1473+
seen in #ref(<ch2-deterministic-builds-and-environments>). In addition, a critical component is
14741474
environment configuration management. Configuration management plays a critical
1475-
role inensuring reproducibility by mitigating the non-deterministic behaviours
1475+
role in ensuring reproducibility by mitigating the non-deterministic behaviours
14761476
introduced by configuration drifts.
14771477

14781478
#info-box[
@@ -1524,9 +1524,9 @@ if not impossible. Environments may progressively "converge" towards a common
15241524
state, but subtle differences can persist, introducing variability. To
15251525
illustrate this model, we could think of an arbitrary environment that needs to
15261526
be configured in a specific way, reach a particular well known state. For
1527-
example, some specific dependencies has to be installed to run a particular
1527+
example, some specific dependencies have to be installed to run a particular
15281528
service. Tools like Puppet #cite(<puppet>, form: "normal"), Chef
1529-
#cite(<chef>, form: "normal"), Terraform #cite(<terraform>,form: "normal")
1529+
#cite(<chef>, form: "normal"), Terraform #cite(<terraform>, form: "normal")
15301530
and Ansible #cite(<ansible>, form: "normal") might help to achieve this goal.
15311531

15321532
While convergent management offers flexibility in responding to unforeseen
@@ -1570,7 +1570,7 @@ approaches based on the environment's needs.
15701570
#info-box[
15711571
Immutable environments ((add ref to ch2-environments)) are environments that are designed
15721572
to be unchangeable once they are created. They are often used in containers
1573-
#eg[Docker #cite(<docker>,form:"normal")], where the ability to quickly create
1573+
#eg[Docker #cite(<docker>, form: "normal")], where the ability to quickly create
15741574
and destroy environments is essential. Immutable environments enhance
15751575
reproducibility and reliability, making them an ideal choice for environments
15761576
that require high levels of predictability and stability.
@@ -1623,15 +1623,19 @@ configuration can be applied multiple times without altering the environment
16231623
beyond its intended state. This abstraction simplifies understanding and
16241624
maintenance by allowing the system to determine the necessary actions to achieve
16251625
the desired state. Tools such as Puppet #cite(<puppet>, form: "normal"),
1626-
Kubernetes #cite(<kubernetes>,form: "normal"),
1627-
Terraform #cite(<terraform>,form: "normal") and, under some conditions,
1626+
Kubernetes #cite(<kubernetes>, form: "normal"),
1627+
Terraform #cite(<terraform>, form: "normal") and, under some conditions,
16281628
Docker #cite(<docker>, form: "normal") are used to specify the desired end
16291629
state. These tools typically feature their own specific #gls("DSL") to create
16301630
high-level descriptions of the expected environment's state, as opposed to
16311631
issuing imperative and procedural commands. The declarative approach mitigates
16321632
the risk of configuration drift by prioritising idempotence, maintaining
16331633
explicit dependency graphs, and ensuring a strong awareness of the current state
1634-
of the environment​​ #cite(<HunterGCP>,form:"normal", supplement: [p. 348]).
1634+
of the environment​​ #cite(<HunterGCP>, form: "normal", supplement: [p. 348]).
1635+
1636+
// This doesn't really convince me, you classify a shell command as
1637+
// "declarative" when they are in shell scripts, which is not much different
1638+
// to me as a Dockerfile
16351639

16361640
#info-box(kind: "note")[
16371641
In @ch2-table-configuration-mgmt, Docker #cite(<docker>, form: "normal") and
@@ -1871,7 +1875,7 @@ Often, timestamps are used to approximate which version of the source were
18711875
built. Since file timestamps are volatile, the source code needs to be tracked
18721876
more accurately than just a timestamp. Just like for version information, the
18731877
solution would be to extract the date from a dedicated file like a changelog, or
1874-
a specific commit #cite(<nixpkgs-pull-256270> ,form: "normal").
1878+
a specific commit #cite(<nixpkgs-pull-256270>, form: "normal").
18751879

18761880
To circumvent this issue, `SOURCE_DATE_EPOCH` is a specific environment variable
18771881
convention for pinning timestamps to a specific value that has been introduced
@@ -1913,7 +1917,7 @@ especially when those builds are not identical. This section introduces a tool
19131917
designed specifically for this purpose.
19141918

19151919
Developed under the umbrella of the @ReproducibleBuildsOrg effort, `diffoscope`
1916-
#cite(<diffoscope>, form:"normal") is a comprehensive, open-source tool that
1920+
#cite(<diffoscope>, form: "normal") is a comprehensive, open-source tool that
19171921
excels in comparing files and directories. Its unique capability to recursively
19181922
unpack archives of various types and transform binary formats into a
19191923
human-readable form makes it an indispensable tool for software comparison. It

src/thesis/imports/preamble.typ

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,6 @@
55
#import "@preview/xarrow:0.3.1": xarrow, xarrowSquiggly, xarrowTwoHead
66
#import "@preview/hydra:0.6.1": *
77
#import "@preview/cetz:0.3.4"
8+
#import "@preview/cetz-plot:0.1.1": *
89
#import "colors.typ": *
910
#import "workarounds.typ": *

0 commit comments

Comments
 (0)