Skip to content

Commit 21def87

Browse files
authored
Merge pull request #14 from drupol/add-environments-section
chapter 2: add new section on Environments
2 parents f406524 + fe1b318 commit 21def87

File tree

3 files changed

+151
-0
lines changed

3 files changed

+151
-0
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM buildpack-deps:bookworm
2+
# ...
3+
RUN set -eux; \
4+
apt-get update; \
5+
apt-get install -y --no-install-recommends \
6+
libbluetooth-dev \
7+
tk-dev \
8+
uuid-dev \
9+
; \
10+
rm -rf /var/lib/apt/lists/*
11+
# ...

src/thesis/2-reproducibility.typ

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1462,6 +1462,116 @@ and at any point in the past or future​​​​.
14621462
environments or machines.
14631463
]
14641464

1465+
=== Computational Environments <ch2-environments>
1466+
1467+
Environments where a build or computational process occurs can be broadly
1468+
categorised into two types: hardware and software environments
1469+
#cite(<strangfeld_2024>,form:"normal", supplement: "p. 8, section 2.1"). While
1470+
software environments can be managed to a high degree of consistency, achieving
1471+
reproducibility across different hardware, particularly different #gls("CPU")
1472+
architectures #eg[`x86`, `ARM`], is essentially impossible. Tasks like
1473+
instruction execution, memory management, and floating-point calculations are
1474+
handled in distinct ways. Even small variations in these processes can lead to
1475+
differences in output. Consequently, even with identical software, builds on
1476+
different types of #gls("CPU") architectures will produce different results.
1477+
When something is said to be reproducible, it typically means reproducible
1478+
within the same #gls("CPU") architecture. Therefore, this section will focus
1479+
exclusively on the reproducibility challenges within software environments.
1480+
1481+
A software environment is composed of the #gls("OS"), along with the set of
1482+
tools, libraries, and dependencies required to build or run a specific
1483+
application. Any change in these components can influence the outcome of a
1484+
software build or execution. For example, a minor update to a library could
1485+
potentially alter the behaviour of the software, producing different outcomes
1486+
across different executions​​ or more importantly, have an impact on the security
1487+
level.
1488+
1489+
To enhance reproducibility, it is critical to ensure that the software
1490+
environment remains stable and unaltered during both the build and execution
1491+
phases. Unfortunately, conventional #glspl("OS") such as Linux distributions,
1492+
Microsoft Windows, and macOS, are #emph[mutable] by default. This mutability is
1493+
primarily facilitated through package managers, which enable users to easily
1494+
modify their environments by installing or upgrading software packages​. As a
1495+
result, uncontrolled changes to dependencies may also lead to inconsistencies in
1496+
software behaviour, or have a impact on the security level, undermining
1497+
reproducibility​.
1498+
1499+
To mitigate these issues, #emph[immutable] environments have gained popularity.
1500+
Tools such as Docker #cite(<docker>,form:"normal") provide mechanisms to
1501+
encapsulate software and their dependencies in containers, thus creating
1502+
environments that remain unchanged after creation. Once a container is built, it
1503+
can be shared and executed across different systems with the guarantee that it
1504+
will function identically, given the same environment. This characteristic makes
1505+
containers highly suitable for distributing software.
1506+
1507+
Despite the advantages of immutability, it does not guarantee reproducibility.
1508+
For instance, container images hosted on platforms like Docker Hub
1509+
#cite(<dockerhub>,form:"normal"), including popular language interpreters
1510+
#eg[Python, NodeJS, PHP], may not be reproducible due to non-deterministic
1511+
steps during the image creation (at build-time). A specific example can be found
1512+
in #ref(<python-dockerfile>), which runs `apt-get update` at line 4 as part of
1513+
the image build process. Since `apt-get` pulls the very latest version of
1514+
package index during its creation, it is impossible to build again the same
1515+
image later, compromising Docker's build-time reproducibility.
1516+
1517+
#figure(
1518+
sourcefile(
1519+
lang: "dockerfile",
1520+
read("../../resources/sourcecode/python.dockerfile"),
1521+
),
1522+
caption: [
1523+
An excerpt of the Python's Dockerfile
1524+
#cite(<python-dockerfile-repository>,form:"normal") used to build the
1525+
#emph[official] Python images.
1526+
],
1527+
) <python-dockerfile>
1528+
1529+
Docker images, once built, are immutable. While Docker does not guarantee
1530+
build-time reproducibility, it has the potential to ensure run-time
1531+
reproducibility, reflecting Docker's philosophy of
1532+
#emph["build once, use everywhere"]. This distinction between build-time
1533+
reproducibility (@def-reproducibility-build-time) and run-time reproducibility
1534+
(@def-reproducibility-run-time) is key. Docker does not ensure that an image
1535+
will always be built consistently, often due to the base image used (as
1536+
declared in the `FROM` directive of a `Dockerfile`), as seen in
1537+
@python-dockerfile. Although building a reproducible image with Docker is
1538+
technically possible, it would require additional effort, external tools, and a
1539+
more complex setup. Therefore, we assume that build-time reproducibility is not
1540+
guaranteed, but the immutability of the environment significantly enhances the
1541+
potential for reproducibility at run-time.
1542+
1543+
#info-box(kind: "important")[
1544+
Docker is a platform for building, shipping, and running applications in
1545+
containers, with Docker Hub #cite(<dockerhub>,form:"normal") providing a large
1546+
repository of container images, which has significantly contributed to
1547+
Docker's popularity. Among these are the #emph[Docker "official" images]
1548+
#cite(<dockerofficialimages>,form:"normal"), which are curated and reviewed by
1549+
the Docker community. These images offer standard environments for popular
1550+
software and adhere to some quality standards.
1551+
1552+
However, the term "official" can be misleading. One might suggest that these
1553+
images are maintained by the original software's developers, but it's not
1554+
always the case. For example, the PHP Docker image
1555+
#cite(<dockerhubphpimage>,form:"normal") is not maintained by the core PHP
1556+
development team. This means updates or fixes may not be as prompt or
1557+
specific as if the software’s developers maintained the image.
1558+
1559+
While Docker vets these images for quality, responsibility for the contents
1560+
rests with the maintainers. Users should be aware that official images are not
1561+
immune to security risks or outdated software, and reviewing the documentation
1562+
for issues is advisable.
1563+
1564+
In summary, Docker "official" images are trusted but may not be maintained by
1565+
the original software’s maintainers. Developers must use them with caution and
1566+
full awareness, particularly in production environments, and ensure that the
1567+
images meet their security and functionality requirements.
1568+
]
1569+
1570+
Package managers are a critical aspect of the reproducibility puzzle since they
1571+
can manage the state of a computational environment. Without proper control over
1572+
how software and their dependencies are resolved and installed, achieving
1573+
consistent and reproducible builds becomes difficult​.
1574+
14651575
=== Sources Of Non-Determinism
14661576

14671577
In this section we will explore the sources of non-determinism in software

src/thesis/literature.bib

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1067,3 +1067,33 @@ @article{4785860
10671067
keywords = {Integrated circuits;Computers;Silicon;Films;Heating;Microwave amplifiers;Data mining},
10681068
doi = {10.1109/N-SSC.2006.4785860}
10691069
}
1070+
1071+
@misc{python-dockerfile-repository,
1072+
title = {Python 3.12 Dockerfile},
1073+
author = {docker-library project1},
1074+
year = 2024,
1075+
url = {https://github.com/docker-library/python/blame/31bbb37b797bd5521d6622c6d54052d6d0ede585/3.12/bookworm/Dockerfile}
1076+
}
1077+
1078+
@misc{dockerofficialimages,
1079+
title = {What are official images},
1080+
author = {Docker Inc.},
1081+
year = 2024,
1082+
url = {https://github.com/docker-library/official-images/blob/6b4803e65a2c56f15b91f8a11bd90f0bcb756c1c/README.md#what-are-official-images},
1083+
}
1084+
1085+
@misc{dockerhubphpimage,
1086+
title = {Docker PHP images},
1087+
author = {{Docker, Inc.}},
1088+
year = 2013,
1089+
url = {https://hub.docker.com/_/php/}
1090+
}
1091+
1092+
@article{strangfeld_2024,
1093+
author = {Strangfeld, Marvin},
1094+
title = {{Reproducibility of Computational Environments for Software Development}},
1095+
school = {RWTH Aachen University},
1096+
year = 2024,
1097+
month = oct,
1098+
doi = {10.5281/zenodo.13843189},
1099+
}

0 commit comments

Comments
 (0)