Skip to content

Commit df73054

Browse files
committed
remove environment section
1 parent 09e21c3 commit df73054

File tree

2 files changed

+5
-124
lines changed

2 files changed

+5
-124
lines changed

resources/sourcecode/python.dockerfile

Lines changed: 0 additions & 11 deletions
This file was deleted.

src/thesis/2-reproducibility.typ

Lines changed: 5 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -1454,121 +1454,13 @@ and at any point in the past or future​​​​.
14541454
variety of environments or machines.
14551455
]
14561456

1457-
=== Environments <ch2-environments>
1458-
1459-
Environments where a build or computational process occurs can be broadly
1460-
categorised into two types: hardware and software environments. While software
1461-
environments can be managed to a high degree of consistency, achieving
1462-
reproducibility across different hardware, particularly different #gls("CPU")
1463-
architectures #eg[`x86`, `ARM`], is essentially impossible. Tasks like
1464-
instruction execution, memory management, and floating-point calculations are
1465-
handled in distinct ways. Even small variations in these processes can lead to
1466-
differences in output. Consequently, even with identical software, builds on
1467-
different types of #gls("CPU") architectures will produce different results.
1468-
When something is said to be reproducible, it typically means reproducible
1469-
within the same #gls("CPU") architecture. Therefore, this section will focus
1470-
exclusively on the reproducibility challenges within software environments.
1471-
1472-
A software environment is composed of the #gls("OS"), along with the set of
1473-
tools, libraries, and dependencies required to build or run a specific
1474-
application. Any change in these components can influence the outcome of a
1475-
software build or execution. For example, a minor update to a library could
1476-
potentially alter the behaviour of the software, producing different outcomes
1477-
across different executions​​ or more importantly, have an impact on the security
1478-
level.
1479-
1480-
To enhance reproducibility, it is critical to ensure that the software
1481-
environment remains stable and unaltered during both the build and execution
1482-
phases. Unfortunately, conventional #glspl("OS") such as Linux distributions,
1483-
Microsoft Windows, and macOS, are #emph[mutable] by default. This mutability is
1484-
primarily facilitated through package managers, which enable users to easily
1485-
modify their environments by installing or upgrading software packages​. As a
1486-
result, uncontrolled changes to dependencies may also lead to inconsistencies in
1487-
software behaviour, or have a impact on the security level, undermining
1488-
reproducibility​.
1489-
1490-
To mitigate these issues, #emph[immutable] environments have gained popularity.
1491-
Tools such as Docker #cite(<docker>,form:"normal") provide mechanisms to
1492-
encapsulate software and its dependencies in containers, thus creating
1493-
environments that remain unchanged after creation. Once a container is built, it
1494-
can be shared and executed across different systems with the guarantee that it
1495-
will function identically, given the same environment. This characteristic makes
1496-
containers highly suitable for distributing software.
1497-
1498-
Despite the advantages of immutability, it does not guarantee reproducibility by
1499-
default. For instance, container images hosted on platforms like Docker Hub
1500-
#cite(<dockerhub>,form:"normal"), including popular language interpreters
1501-
#eg[Python, Node, PHP], may not be reproducible due to non-deterministic
1502-
steps during the image creation. A specific example can be found in
1503-
#ref(<python-dockerfile>), which runs `apt-get update` at line 4 as part of the
1504-
image build process. Since `apt-get` pulls the latest version of package lists
1505-
at build-time, it is impossible to reproduce the same image later, compromising
1506-
Docker's build-time reproducibility.
1507-
1508-
#figure(
1509-
sourcefile(
1510-
lang: "dockerfile",
1511-
read("../../resources/sourcecode/python.dockerfile"),
1512-
),
1513-
caption: [
1514-
An excerpt of the Python's Dockerfile
1515-
#cite(<python-dockerfile-repository>,form:"normal") used to build the
1516-
#emph[official] Python images.
1517-
],
1518-
) <python-dockerfile>
1519-
1520-
Docker images, once built, are immutable. While Docker does not guarantee
1521-
build-time reproducibility, it has the potential to ensure run-time
1522-
reproducibility, reflecting Docker's philosophy of
1523-
#emph["build once, use everywhere"]. This distinction between build-time
1524-
reproducibility (@def-reproducibility-build-time) and run-time reproducibility
1525-
(@def-reproducibility-run-time) is key. Docker does not ensure that an image
1526-
will always be built consistently, often due to the base image used (as
1527-
specified in the `FROM` directive of a `Dockerfile`), as seen in
1528-
@python-dockerfile. Although building a reproducible image with Docker is
1529-
technically possible, it would require additional effort, external tools, and a
1530-
more complex setup. Therefore, we assume that build-time reproducibility is not
1531-
guaranteed, but the immutability of the environment significantly enhances the
1532-
potential for reproducibility at run-time.
1533-
1534-
#info-box[
1535-
Docker #cite(<docker>,form:"normal") is a platform for building, shipping, and
1536-
running applications in containers, with Docker Hub
1537-
#cite(<dockerhub>,form:"normal") providing a large repository of container
1538-
images, which has significantly contributed to Docker's popularity. Among
1539-
these are "official" Docker images
1540-
#cite(<dockerofficialimages>,form:"normal"), which are curated and reviewed by
1541-
Docker Inc. These images offer standard environments for popular software and
1542-
adhere to some quality standards.
1543-
1544-
However, the term "official" can be misleading. One might suggest that these
1545-
images are maintained by the original software's developers, but it's not
1546-
always the case. For example, the PHP Docker image is not maintained by the
1547-
core PHP development team. This means updates or fixes may not be as prompt or
1548-
specific as if the software’s developers maintained the image.
1549-
1550-
While Docker vets these images for quality, responsibility for the contents
1551-
rests with the maintainers. Users should be aware that official images are not
1552-
immune to security risks or outdated software, and reviewing the documentation
1553-
for issues is advisable.
1554-
1555-
In summary, "official" Docker images are trusted but may not be maintained by
1556-
the software’s creators. Developers should use them with care, especially in
1557-
production environments, and verify that the images meet their security and
1558-
functionality needs.
1559-
]
1560-
1561-
Package managers are a critical aspect of the reproducibility puzzle. Without
1562-
proper control over how dependencies are resolved and installed, achieving
1563-
consistent and reproducible builds becomes difficult​.
1564-
15651457
==== Configuration Management
15661458

15671459
Reproducibility relies on stable, consistent and well-maintained codebases but
15681460
also heavily depends on stable, consistent and well-maintained environments as
1569-
seen in @ch2-environments. In addition, a critical component is environment
1570-
configuration management. Configuration management plays a critical role in
1571-
ensuring reproducibility by mitigating the non-deterministic behaviours
1461+
seen in (add ref to ch2-environments). In addition, a critical component is
1462+
environment configuration management. Configuration management plays a critical
1463+
role inensuring reproducibility by mitigating the non-deterministic behaviours
15721464
introduced by configuration drifts.
15731465

15741466
#info-box[
@@ -1656,15 +1548,15 @@ goal of this model, providing the highest level of determinism and reliability
16561548
in system behaviours.
16571549

16581550
Congruent management, particularly through the adoption of immutable
1659-
environment (@ch2-environments), ensures that environment remain in a
1551+
environment ((add ref to ch2-environments)), ensures that environment remain in a
16601552
well-defined state, thus maximising reproducibility. However, this approach can
16611553
lack the flexibility required for dynamic environments, where each minor
16621554
adjustments may necessitate rebuilding the entire system. This limitation
16631555
highlights the importance of carefully choosing between convergent and congruent
16641556
approaches based on the environment's needs.
16651557

16661558
#info-box[
1667-
Immutable environments (@ch2-environments) are environments that are designed
1559+
Immutable environments ((add ref to ch2-environments)) are environments that are designed
16681560
to be unchangeable once they are created. They are often used in containers
16691561
#eg[Docker #cite(<docker>,form:"normal")], where the ability to quickly create
16701562
and destroy environments is essential. Immutable environments enhance

0 commit comments

Comments
 (0)