Skip to content

Commit dd37c1b

Browse files
committed
chapter 2: add new section on Environments
1 parent 27a9884 commit dd37c1b

File tree

3 files changed

+132
-0
lines changed

3 files changed

+132
-0
lines changed
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
FROM buildpack-deps:bookworm
2+
# ...
3+
RUN set -eux; \
4+
apt-get update; \
5+
apt-get install -y --no-install-recommends \
6+
libbluetooth-dev \
7+
tk-dev \
8+
uuid-dev \
9+
; \
10+
rm -rf /var/lib/apt/lists/*
11+
# ...

src/thesis/2-reproducibility.typ

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1450,6 +1450,113 @@ and at any point in the past or future​​​​.
14501450
environments or machines.
14511451
]
14521452

1453+
=== Environments <ch2-environments>
1454+
1455+
Environments where a build or computational process occurs can be broadly
1456+
categorised into two types: hardware and software environments. While software
1457+
environments can be managed to a high degree of consistency, achieving
1458+
reproducibility across different hardware, particularly different #gls("CPU")
1459+
architectures #eg[`x86`, `ARM`], is essentially impossible. Tasks like
1460+
instruction execution, memory management, and floating-point calculations are
1461+
handled in distinct ways. Even small variations in these processes can lead to
1462+
differences in output. Consequently, even with identical software, builds on
1463+
different types of #gls("CPU") architectures will produce different results.
1464+
When something is said to be reproducible, it typically means reproducible
1465+
within the same #gls("CPU") architecture. Therefore, this section will focus
1466+
exclusively on the reproducibility challenges within software environments.
1467+
1468+
A software environment is composed of the #gls("OS"), along with the set of
1469+
tools, libraries, and dependencies required to build or run a specific
1470+
application. Any change in these components can influence the outcome of a
1471+
software build or execution. For example, a minor update to a library could
1472+
potentially alter the behaviour of the software, producing different outcomes
1473+
across different executions​​ or more importantly, have an impact on the security
1474+
level.
1475+
1476+
To enhance reproducibility, it is critical to ensure that the software
1477+
environment remains stable and unaltered during both the build and execution
1478+
phases. Unfortunately, conventional #glspl("OS") such as Linux distributions,
1479+
Microsoft Windows, and macOS, are *mutable* by default. This mutability is
1480+
primarily facilitated through package managers, which enable users to easily
1481+
modify their environments by installing or upgrading software packages​. As a
1482+
result, uncontrolled changes to dependencies may also lead to inconsistencies in
1483+
software behaviour, or have a impact on the security level, undermining
1484+
reproducibility​.
1485+
1486+
To mitigate these issues, *immutable* environments have gained popularity.
1487+
Tools such as Docker provide mechanisms to encapsulate software and its
1488+
dependencies in containers, thus creating environments that remain unchanged
1489+
after creation. Once a container is built, it can be shared and executed across
1490+
different systems with the guarantee that it will function identically, given
1491+
the same environment. This characteristic makes containers highly suitable for
1492+
distributing software.
1493+
1494+
Despite the advantages of immutability, it does not guarantee reproducibility by
1495+
default. For instance, container images hosted on platforms like Docker Hub
1496+
#cite(<dockerhub>,form:"normal"), including popular language interpreters
1497+
#eg[Python, NodeJS, PHP], may not be reproducible due to non-deterministic
1498+
steps during the image creation. A specific example can be found in
1499+
#ref(<python-dockerfile>), which runs `apt-get update` at line 4 as part of the
1500+
image build process. Since `apt-get` pulls the latest version of package lists
1501+
at build-time, it is impossible to reproduce the same image later, compromising
1502+
Docker's build-time reproducibility.
1503+
1504+
#figure(
1505+
sourcefile(
1506+
lang: "dockerfile",
1507+
read("../../resources/sourcecode/python.dockerfile"),
1508+
),
1509+
caption: [
1510+
An excerpt of the Python's Dockerfile
1511+
#cite(<python-dockerfile-repository>,form:"normal") used to build the
1512+
#emph[official] Python images.
1513+
],
1514+
) <python-dockerfile>
1515+
1516+
Docker images, once built, are immutable. While Docker does not guarantee
1517+
build-time reproducibility, it has the potential to ensure run-time
1518+
reproducibility, reflecting Docker's philosophy of
1519+
#emph["build once, use everywhere"]. This distinction between build-time
1520+
reproducibility (@def-reproducibility-build-time) and run-time reproducibility
1521+
(@def-reproducibility-run-time) is key. Docker does not ensure that an image
1522+
will always be built consistently, often due to the base image used (as
1523+
specified in the `FROM` directive of a `Dockerfile`), as seen in
1524+
@python-dockerfile. Although building a reproducible image with Docker is
1525+
technically possible, it would require additional effort, external tools, and a
1526+
more complex setup. Therefore, we assume that build-time reproducibility is not
1527+
guaranteed, but the immutability of the environment significantly enhances the
1528+
potential for reproducibility at run-time.
1529+
1530+
#info-box[
1531+
Docker is a platform for building, shipping, and running applications in
1532+
containers, with Docker Hub #cite(<dockerhub>,form:"normal") providing a large
1533+
repository of container images, which has significantly contributed to
1534+
Docker's popularity. Among these are "official" Docker images
1535+
#cite(<dockerofficialimages>,form:"normal"), which are curated and reviewed by
1536+
Docker Inc. These images offer standard environments for popular software and
1537+
adhere to some quality standards.
1538+
1539+
However, the term "official" can be misleading. One might suggest that these
1540+
images are maintained by the original software's developers, but it's not
1541+
always the case. For example, the PHP Docker image is not maintained by the
1542+
core PHP development team. This means updates or fixes may not be as prompt or
1543+
specific as if the software’s developers maintained the image.
1544+
1545+
While Docker vets these images for quality, responsibility for the contents
1546+
rests with the maintainers. Users should be aware that official images are not
1547+
immune to security risks or outdated software, and reviewing the documentation
1548+
for issues is advisable.
1549+
1550+
In summary, "official" Docker images are trusted but may not be maintained by
1551+
the software’s creators. Developers should use them with care, especially in
1552+
production environments, and verify that the images meet their security and
1553+
functionality needs.
1554+
]
1555+
1556+
Package managers are a critical aspect of the reproducibility puzzle. Without
1557+
proper control over how dependencies are resolved and installed, achieving
1558+
consistent and reproducible builds becomes difficult​.
1559+
14531560
=== Sources Of Non-Determinism
14541561

14551562
In this section we will explore the sources of non-determinism in software

src/thesis/literature.bib

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1048,3 +1048,17 @@ @article{8509170
10481048
keywords = {Microsoft Windows;History;Software development;Software maintenance;Software engineering},
10491049
doi = {10.1109/MAHC.2018.2877913}
10501050
}
1051+
1052+
@misc{python-dockerfile-repository,
1053+
title = {Python 3.12 Dockerfile},
1054+
author = {docker-library project1},
1055+
year = 2024,
1056+
url = {https://github.com/docker-library/python/blame/31bbb37b797bd5521d6622c6d54052d6d0ede585/3.12/bookworm/Dockerfile}
1057+
}
1058+
1059+
@misc{dockerofficialimages,
1060+
title = {What are official images},
1061+
author = {Docker Inc.},
1062+
year = 2024,
1063+
url = {https://github.com/docker-library/official-images/blob/6b4803e65a2c56f15b91f8a11bd90f0bcb756c1c/README.md#what-are-official-images},
1064+
}

0 commit comments

Comments
 (0)