@@ -1472,6 +1472,116 @@ and at any point in the past or future.
1472
1472
environments or machines.
1473
1473
]
1474
1474
1475
+ === Computational Environments <ch2-environments>
1476
+
1477
+ Environments where a build or computational process occurs can be broadly
1478
+ categorised into two types: hardware and software environments
1479
+ # cite (<strangfeld_2024> ,form :" normal" , supplement : " p. 8, section 2.1" ). While
1480
+ software environments can be managed to a high degree of consistency, achieving
1481
+ reproducibility across different hardware, particularly different # gls (" CPU" )
1482
+ architectures # eg [`x86` , `ARM` ], is essentially impossible. Tasks like
1483
+ instruction execution, memory management, and floating-point calculations are
1484
+ handled in distinct ways. Even small variations in these processes can lead to
1485
+ differences in output. Consequently, even with identical software, builds on
1486
+ different types of # gls (" CPU" ) architectures will produce different results.
1487
+ When something is said to be reproducible, it typically means reproducible
1488
+ within the same # gls (" CPU" ) architecture. Therefore, this section will focus
1489
+ exclusively on the reproducibility challenges within software environments.
1490
+
1491
+ A software environment is composed of the # gls (" OS" ), along with the set of
1492
+ tools, libraries, and dependencies required to build or run a specific
1493
+ application. Any change in these components can influence the outcome of a
1494
+ software build or execution. For example, a minor update to a library could
1495
+ potentially alter the behaviour of the software, producing different outcomes
1496
+ across different executions or more importantly, have an impact on the security
1497
+ level.
1498
+
1499
+ To enhance reproducibility, it is critical to ensure that the software
1500
+ environment remains stable and unaltered during both the build and execution
1501
+ phases. Unfortunately, conventional # glspl (" OS" ) such as Linux distributions,
1502
+ Microsoft Windows, and macOS, are # emph [mutable] by default. This mutability is
1503
+ primarily facilitated through package managers, which enable users to easily
1504
+ modify their environments by installing or upgrading software packages. As a
1505
+ result, uncontrolled changes to dependencies may also lead to inconsistencies in
1506
+ software behaviour, or have a impact on the security level, undermining
1507
+ reproducibility.
1508
+
1509
+ To mitigate these issues, # emph [immutable] environments have gained popularity.
1510
+ Tools such as Docker # cite (<docker> ,form :" normal" ) provide mechanisms to
1511
+ encapsulate software and their dependencies in containers, thus creating
1512
+ environments that remain unchanged after creation. Once a container is built, it
1513
+ can be shared and executed across different systems with the guarantee that it
1514
+ will function identically, given the same environment. This characteristic makes
1515
+ containers highly suitable for distributing software.
1516
+
1517
+ Despite the advantages of immutability, it does not guarantee reproducibility.
1518
+ For instance, container images hosted on platforms like Docker Hub
1519
+ # cite (<dockerhub> ,form :" normal" ), including popular language interpreters
1520
+ # eg [Python, NodeJS, PHP], may not be reproducible due to non-deterministic
1521
+ steps during the image creation (at build-time). A specific example can be found
1522
+ in # ref (<python-dockerfile> ), which runs `apt-get update` at line 4 as part of
1523
+ the image build process. Since `apt-get` pulls the very latest version of
1524
+ package index during its creation, it is impossible to build again the same
1525
+ image later, compromising Docker's build-time reproducibility.
1526
+
1527
+ # figure (
1528
+ sourcefile (
1529
+ lang : " dockerfile" ,
1530
+ read (" ../../resources/sourcecode/python.dockerfile" ),
1531
+ ),
1532
+ caption : [
1533
+ An excerpt of the Python's Dockerfile
1534
+ # cite (<python-dockerfile-repository> ,form :" normal" ) used to build the
1535
+ # emph [official] Python images.
1536
+ ],
1537
+ ) <python-dockerfile>
1538
+
1539
+ Docker images, once built, are immutable. While Docker does not guarantee
1540
+ build-time reproducibility, it has the potential to ensure run-time
1541
+ reproducibility, reflecting Docker's philosophy of
1542
+ # emph ["build once, use everywhere"]. This distinction between build-time
1543
+ reproducibility (@def-reproducibility-build-time ) and run-time reproducibility
1544
+ (@def-reproducibility-run-time ) is key. Docker does not ensure that an image
1545
+ will always be built consistently, often due to the base image used (as
1546
+ declared in the `FROM` directive of a `Dockerfile` ), as seen in
1547
+ @python-dockerfile . Although building a reproducible image with Docker is
1548
+ technically possible, it would require additional effort, external tools, and a
1549
+ more complex setup. Therefore, we assume that build-time reproducibility is not
1550
+ guaranteed, but the immutability of the environment significantly enhances the
1551
+ potential for reproducibility at run-time.
1552
+
1553
+ # info-box (kind : " important" )[
1554
+ Docker is a platform for building, shipping, and running applications in
1555
+ containers, with Docker Hub # cite (<dockerhub> ,form :" normal" ) providing a large
1556
+ repository of container images, which has significantly contributed to
1557
+ Docker's popularity. Among these are the # emph [Docker "official" images]
1558
+ # cite (<dockerofficialimages> ,form :" normal" ), which are curated and reviewed by
1559
+ the Docker community. These images offer standard environments for popular
1560
+ software and adhere to some quality standards.
1561
+
1562
+ However, the term "official" can be misleading. One might suggest that these
1563
+ images are maintained by the original software's developers, but it's not
1564
+ always the case. For example, the PHP Docker image
1565
+ # cite (<dockerhubphpimage> ,form :" normal" ) is not maintained by the core PHP
1566
+ development team. This means updates or fixes may not be as prompt or
1567
+ specific as if the software’s developers maintained the image.
1568
+
1569
+ While Docker vets these images for quality, responsibility for the contents
1570
+ rests with the maintainers. Users should be aware that official images are not
1571
+ immune to security risks or outdated software, and reviewing the documentation
1572
+ for issues is advisable.
1573
+
1574
+ In summary, Docker "official" images are trusted but may not be maintained by
1575
+ the original software’s maintainers. Developers must use them with caution and
1576
+ full awareness, particularly in production environments, and ensure that the
1577
+ images meet their security and functionality requirements.
1578
+ ]
1579
+
1580
+ Package managers are a critical aspect of the reproducibility puzzle since they
1581
+ can manage the state of a computational environment. Without proper control over
1582
+ how software and their dependencies are resolved and installed, achieving
1583
+ consistent and reproducible builds becomes difficult.
1584
+
1475
1585
=== Sources Of Non-Determinism
1476
1586
1477
1587
In this section we will explore the sources of non-determinism in software
0 commit comments