@@ -264,7 +264,7 @@ reproducibility in #gls("CS").
264
264
# emph [Space] and # emph [Time] are terms borrowed from physics. In the context
265
265
of reproducibility in # gls (" SE" ), space refers to different systems, while
266
266
time refers to different moments in time
267
- # cite (<malka-hal-04430009> ,form :" normal" ). (more about that in
267
+ # cite (<malka-hal-04430009> , form : " normal" ). (more about that in
268
268
@def-deterministic-build ).
269
269
]
270
270
@@ -458,7 +458,7 @@ We will explore this concept with Docker images as a primary example. Docker, a
458
458
popular containerization platform, uses Dockerfiles (@dockerfile-example ).
459
459
Basically, a `Dockerfile` is a script with a set of instructions to build
460
460
images. These images are then used to run software in a consistent environment.
461
- However, images on Docker Hub # cite (<dockerhub> , form :" normal" ) often present
461
+ However, images on Docker Hub # cite (<dockerhub> , form : " normal" ) often present
462
462
challenges to reproducibility. The reasons vary: some Dockerfiles are not
463
463
publicly available but especially because most of them include significant
464
464
variability in their build processes, making exact replication of the images
@@ -716,7 +716,7 @@ which it will be evaluated, effectively making, to some extent, this environment
716
716
an extra input parameter per se. This computational environment, which
717
717
encompasses the hardware # eg [filesystem, memory, # gls (" CPU" , long : false )],
718
718
software # eg [# gls (" OS" , long : false )] and date # eg [the current date and
719
- time], may influence the function's behaviour and output. Consequently,
719
+ time], may influence the function's behaviour and output. Consequently,
720
720
functions in # gls (" CS" ) are inherently designed to interact with and adapt to
721
721
their environment, thereby making them dynamic and versatile but also
722
722
potentially non-deterministic.
@@ -770,7 +770,7 @@ reflects the state of its transitive inputs. Basically, the output represents
770
770
all direct and indirect dependencies used in the build process.
771
771
"Transitive inputs" refer to not only the direct inputs # eg [source code] but
772
772
also to the inputs of those inputs # eg [libraries, frameworks, compilers, data
773
- resources].
773
+ resources].
774
774
775
775
From the point of view of the software build process as shown in
776
776
@inputs-outputs-part1 , the inputs are all the source code files, configuration
@@ -1040,7 +1040,7 @@ produce the same hash, an occurrence known as a #emph[collision]. The ability to
1040
1040
find collisions undermines the security of the algorithm. There are different
1041
1041
types of algorithms to calculate a checksum
1042
1042
# eg [# gls (" MD5" , long : false ), # gls (" SHA1" , long : false ),
1043
- # gls (" SHA2" , long : false )]. Older algorithms like # gls (" MD5" , long : false ) have
1043
+ # gls (" SHA2" , long : false )]. Older algorithms like # gls (" MD5" , long : false ) have
1044
1044
known vulnerabilities that allow collision attacks while more modern algorithms
1045
1045
like SHA-256 (# gls (" SHA2" , long : false )) are currently considered to be pretty
1046
1046
much impossible to crack.
@@ -1055,7 +1055,7 @@ despite the theoretical potential for identical hashes of different inputs.
1055
1055
# info-box (kind : " info" )[
1056
1056
Choosing an appropriate checksum algorithm is paramount due to the rapid
1057
1057
evolution of computational power as described by Moore's Law
1058
- # cite (< 4785860 > ,form :" normal" ), which leads to previously secure algorithms
1058
+ # cite (< 4785860 > , form : " normal" ), which leads to previously secure algorithms
1059
1059
becoming vulnerable as computing capabilities expand.
1060
1060
1061
1061
For instance, # gls (" MD5" ) checksums, once deemed secure for storing passwords,
@@ -1155,7 +1155,7 @@ The process of controlling the computational environment $E$ underscores a
1155
1155
fundamental challenge in # gls (" SE" ): achieving reproducibility through
1156
1156
environment standardisation. The environment includes specific factors such as
1157
1157
hardware and software configurations # eg [# gls (" CPU" ), # gls (" OS" ), library
1158
- versions, and runtime conditions] that directly affect a function's behaviour
1158
+ versions, and runtime conditions] that directly affect a function's behaviour
1159
1159
and output. The Monte Carlo simulation algorithm (@montecarlo-pi.c ), exemplifies
1160
1160
this challenge: it may be reproducible at build time but can exhibit variance at
1161
1161
run time due to environmental factors.
@@ -1237,7 +1237,7 @@ primarily in their focus, structure, and community support. The choice between
1237
1237
specific needs, whether the focus is on extensive licensing compliance or
1238
1238
streamlined security and risk management within the software supply chain.
1239
1239
1240
- The # gls (" CRA" ) # cite (<CRA> , form :" normal" ) mandates the incorporation of a
1240
+ The # gls (" CRA" ) # cite (<CRA> , form : " normal" ) mandates the incorporation of a
1241
1241
# gls (" SBOM" ) in software products, highlighting its important role in bolstering
1242
1242
software security and transparency. This requirement marks a significant
1243
1243
advancement in enhancing the integrity and security of software, ensuring that
@@ -1246,10 +1246,10 @@ lifecycle. While the #gls("CRA") includes multiple measures, most will take
1246
1246
effect three years after its passage, likely in early 2027. Specifically,
1247
1247
regarding # gls (" SBOM" ), the following applies to products with digital elements
1248
1248
available: # quote [identify and document vulnerabilities and components contained
1249
- in products with digital elements, including by drawing up a software bill of
1250
- materials in a commonly used and machine-readable format covering at the very
1251
- least the top-level dependencies of the products]
1252
- # cite (<CRA> , supplement : " Annex I, Part II (1)" , form :" normal" ).
1249
+ in products with digital elements, including by drawing up a software bill of
1250
+ materials in a commonly used and machine-readable format covering at the very
1251
+ least the top-level dependencies of the products]
1252
+ # cite (<CRA> , supplement : " Annex I, Part II (1)" , form : " normal" ).
1253
1253
1254
1254
==== Supply Chain <ch2-supply-chain>
1255
1255
@@ -1269,7 +1269,7 @@ direct and indirect dependencies, adding complexity to the software supply
1269
1269
chain. The build environments, which encompass all necessary components and
1270
1270
their precise versions for software compilation, become intricate and difficult
1271
1271
to replicate across different systems and over time. This growing complexity,
1272
- "politely called # emph [dependency management]" # cite (< 8509170 > , form :" normal" )
1272
+ "politely called # emph [dependency management]" # cite (< 8509170 > , form : " normal" )
1273
1273
but more colloquially known as # emph [dependency hell], is a phenomenon that
1274
1274
developers have become all too familiar with. While Semantic Versioning
1275
1275
(@package-managers ) offers a strategy to mitigate these issues, it alone is
@@ -1381,7 +1381,7 @@ frequently unattainable in practice.
1381
1381
1382
1382
One of the primary impediments in achieving reproducibility lies in the
1383
1383
dependency on hardware architecture. Software compiled for different
1384
- architectures, such as `x86` and `ARM,` inherently produces disparate binaries # cite (<patterson2013> ,form :" normal" ).
1384
+ architectures, such as `x86` and `ARM,` inherently produces disparate binaries # cite (<patterson2013> , form : " normal" ).
1385
1385
These differences stem from the instruction sets and optimizations that are
1386
1386
specific to each platform, leading to divergent outputs despite using identical
1387
1387
source code. This variance highlights a significant reproducibility challenge,
@@ -1412,7 +1412,7 @@ entirely achievable, we will delve deeper into these challenges by exploring the
1412
1412
impact of non-deterministic compilers and the strategies to mitigate these
1413
1413
challenges using different methods.
1414
1414
1415
- == Deterministic Builds And Environments
1415
+ == Deterministic Builds And Environments <ch2-deterministic-builds-and-environments>
1416
1416
1417
1417
In this section, we will explore the concept of deterministic builds, and the
1418
1418
potential sources of non-determinism in software builds.
@@ -1470,9 +1470,9 @@ and at any point in the past or future.
1470
1470
1471
1471
Reproducibility relies on stable, consistent and well-maintained codebases but
1472
1472
also heavily depends on stable, consistent and well-maintained environments as
1473
- seen in (add ref to ch2-environments). In addition, a critical component is
1473
+ seen in # ref ( < ch2-deterministic-builds-and- environments> ). In addition, a critical component is
1474
1474
environment configuration management. Configuration management plays a critical
1475
- role inensuring reproducibility by mitigating the non-deterministic behaviours
1475
+ role in ensuring reproducibility by mitigating the non-deterministic behaviours
1476
1476
introduced by configuration drifts.
1477
1477
1478
1478
# info-box [
@@ -1524,9 +1524,9 @@ if not impossible. Environments may progressively "converge" towards a common
1524
1524
state, but subtle differences can persist, introducing variability. To
1525
1525
illustrate this model, we could think of an arbitrary environment that needs to
1526
1526
be configured in a specific way, reach a particular well known state. For
1527
- example, some specific dependencies has to be installed to run a particular
1527
+ example, some specific dependencies have to be installed to run a particular
1528
1528
service. Tools like Puppet # cite (<puppet> , form : " normal" ), Chef
1529
- # cite (<chef> , form : " normal" ), Terraform # cite (<terraform> ,form : " normal" )
1529
+ # cite (<chef> , form : " normal" ), Terraform # cite (<terraform> , form : " normal" )
1530
1530
and Ansible # cite (<ansible> , form : " normal" ) might help to achieve this goal.
1531
1531
1532
1532
While convergent management offers flexibility in responding to unforeseen
@@ -1570,7 +1570,7 @@ approaches based on the environment's needs.
1570
1570
# info-box [
1571
1571
Immutable environments ((add ref to ch2-environments)) are environments that are designed
1572
1572
to be unchangeable once they are created. They are often used in containers
1573
- # eg [Docker # cite (<docker> ,form :" normal" )], where the ability to quickly create
1573
+ # eg [Docker # cite (<docker> , form : " normal" )], where the ability to quickly create
1574
1574
and destroy environments is essential. Immutable environments enhance
1575
1575
reproducibility and reliability, making them an ideal choice for environments
1576
1576
that require high levels of predictability and stability.
@@ -1623,15 +1623,19 @@ configuration can be applied multiple times without altering the environment
1623
1623
beyond its intended state. This abstraction simplifies understanding and
1624
1624
maintenance by allowing the system to determine the necessary actions to achieve
1625
1625
the desired state. Tools such as Puppet # cite (<puppet> , form : " normal" ),
1626
- Kubernetes # cite (<kubernetes> ,form : " normal" ),
1627
- Terraform # cite (<terraform> ,form : " normal" ) and, under some conditions,
1626
+ Kubernetes # cite (<kubernetes> , form : " normal" ),
1627
+ Terraform # cite (<terraform> , form : " normal" ) and, under some conditions,
1628
1628
Docker # cite (<docker> , form : " normal" ) are used to specify the desired end
1629
1629
state. These tools typically feature their own specific # gls (" DSL" ) to create
1630
1630
high-level descriptions of the expected environment's state, as opposed to
1631
1631
issuing imperative and procedural commands. The declarative approach mitigates
1632
1632
the risk of configuration drift by prioritising idempotence, maintaining
1633
1633
explicit dependency graphs, and ensuring a strong awareness of the current state
1634
- of the environment # cite (<HunterGCP> ,form :" normal" , supplement : [p. 348]).
1634
+ of the environment # cite (<HunterGCP> , form : " normal" , supplement : [p. 348]).
1635
+
1636
+ // This doesn't really convince me, you classify a shell command as
1637
+ // "declarative" when they are in shell scripts, which is not much different
1638
+ // to me as a Dockerfile
1635
1639
1636
1640
# info-box (kind : " note" )[
1637
1641
In @ch2-table-configuration-mgmt , Docker # cite (<docker> , form : " normal" ) and
@@ -1871,7 +1875,7 @@ Often, timestamps are used to approximate which version of the source were
1871
1875
built. Since file timestamps are volatile, the source code needs to be tracked
1872
1876
more accurately than just a timestamp. Just like for version information, the
1873
1877
solution would be to extract the date from a dedicated file like a changelog, or
1874
- a specific commit # cite (<nixpkgs-pull-256270> , form : " normal" ).
1878
+ a specific commit # cite (<nixpkgs-pull-256270> , form : " normal" ).
1875
1879
1876
1880
To circumvent this issue, `SOURCE_DATE_EPOCH` is a specific environment variable
1877
1881
convention for pinning timestamps to a specific value that has been introduced
@@ -1913,7 +1917,7 @@ especially when those builds are not identical. This section introduces a tool
1913
1917
designed specifically for this purpose.
1914
1918
1915
1919
Developed under the umbrella of the @ReproducibleBuildsOrg effort, `diffoscope`
1916
- # cite (<diffoscope> , form :" normal" ) is a comprehensive, open-source tool that
1920
+ # cite (<diffoscope> , form : " normal" ) is a comprehensive, open-source tool that
1917
1921
excels in comparing files and directories. Its unique capability to recursively
1918
1922
unpack archives of various types and transform binary formats into a
1919
1923
human-readable form makes it an indispensable tool for software comparison. It
0 commit comments