@@ -1462,6 +1462,116 @@ and at any point in the past or future.
1462
1462
environments or machines.
1463
1463
]
1464
1464
1465
+ === Computational Environments <ch2-environments>
1466
+
1467
+ Environments where a build or computational process occurs can be broadly
1468
+ categorised into two types: hardware and software environments
1469
+ # cite (<strangfeld_2024> ,form :" normal" , supplement : " p. 8, section 2.1" ). While
1470
+ software environments can be managed to a high degree of consistency, achieving
1471
+ reproducibility across different hardware, particularly different # gls (" CPU" )
1472
+ architectures # eg [`x86` , `ARM` ], is essentially impossible. Tasks like
1473
+ instruction execution, memory management, and floating-point calculations are
1474
+ handled in distinct ways. Even small variations in these processes can lead to
1475
+ differences in output. Consequently, even with identical software, builds on
1476
+ different types of # gls (" CPU" ) architectures will produce different results.
1477
+ When something is said to be reproducible, it typically means reproducible
1478
+ within the same # gls (" CPU" ) architecture. Therefore, this section will focus
1479
+ exclusively on the reproducibility challenges within software environments.
1480
+
1481
+ A software environment is composed of the # gls (" OS" ), along with the set of
1482
+ tools, libraries, and dependencies required to build or run a specific
1483
+ application. Any change in these components can influence the outcome of a
1484
+ software build or execution. For example, a minor update to a library could
1485
+ potentially alter the behaviour of the software, producing different outcomes
1486
+ across different executions or more importantly, have an impact on the security
1487
+ level.
1488
+
1489
+ To enhance reproducibility, it is critical to ensure that the software
1490
+ environment remains stable and unaltered during both the build and execution
1491
+ phases. Unfortunately, conventional # glspl (" OS" ) such as Linux distributions,
1492
+ Microsoft Windows, and macOS, are # emph [mutable] by default. This mutability is
1493
+ primarily facilitated through package managers, which enable users to easily
1494
+ modify their environments by installing or upgrading software packages. As a
1495
+ result, uncontrolled changes to dependencies may also lead to inconsistencies in
1496
+ software behaviour, or have a impact on the security level, undermining
1497
+ reproducibility.
1498
+
1499
+ To mitigate these issues, # emph [immutable] environments have gained popularity.
1500
+ Tools such as Docker # cite (<docker> ,form :" normal" ) provide mechanisms to
1501
+ encapsulate software and their dependencies in containers, thus creating
1502
+ environments that remain unchanged after creation. Once a container is built, it
1503
+ can be shared and executed across different systems with the guarantee that it
1504
+ will function identically, given the same environment. This characteristic makes
1505
+ containers highly suitable for distributing software.
1506
+
1507
+ Despite the advantages of immutability, it does not guarantee reproducibility.
1508
+ For instance, container images hosted on platforms like Docker Hub
1509
+ # cite (<dockerhub> ,form :" normal" ), including popular language interpreters
1510
+ # eg [Python, NodeJS, PHP], may not be reproducible due to non-deterministic
1511
+ steps during the image creation (at build-time). A specific example can be found
1512
+ in # ref (<python-dockerfile> ), which runs `apt-get update` at line 4 as part of
1513
+ the image build process. Since `apt-get` pulls the very latest version of
1514
+ package index during its creation, it is impossible to build again the same
1515
+ image later, compromising Docker's build-time reproducibility.
1516
+
1517
+ # figure (
1518
+ sourcefile (
1519
+ lang : " dockerfile" ,
1520
+ read (" ../../resources/sourcecode/python.dockerfile" ),
1521
+ ),
1522
+ caption : [
1523
+ An excerpt of the Python's Dockerfile
1524
+ # cite (<python-dockerfile-repository> ,form :" normal" ) used to build the
1525
+ # emph [official] Python images.
1526
+ ],
1527
+ ) <python-dockerfile>
1528
+
1529
+ Docker images, once built, are immutable. While Docker does not guarantee
1530
+ build-time reproducibility, it has the potential to ensure run-time
1531
+ reproducibility, reflecting Docker's philosophy of
1532
+ # emph ["build once, use everywhere"]. This distinction between build-time
1533
+ reproducibility (@def-reproducibility-build-time ) and run-time reproducibility
1534
+ (@def-reproducibility-run-time ) is key. Docker does not ensure that an image
1535
+ will always be built consistently, often due to the base image used (as
1536
+ declared in the `FROM` directive of a `Dockerfile` ), as seen in
1537
+ @python-dockerfile . Although building a reproducible image with Docker is
1538
+ technically possible, it would require additional effort, external tools, and a
1539
+ more complex setup. Therefore, we assume that build-time reproducibility is not
1540
+ guaranteed, but the immutability of the environment significantly enhances the
1541
+ potential for reproducibility at run-time.
1542
+
1543
+ # info-box (kind : " important" )[
1544
+ Docker is a platform for building, shipping, and running applications in
1545
+ containers, with Docker Hub # cite (<dockerhub> ,form :" normal" ) providing a large
1546
+ repository of container images, which has significantly contributed to
1547
+ Docker's popularity. Among these are the # emph [Docker "official" images]
1548
+ # cite (<dockerofficialimages> ,form :" normal" ), which are curated and reviewed by
1549
+ the Docker community. These images offer standard environments for popular
1550
+ software and adhere to some quality standards.
1551
+
1552
+ However, the term "official" can be misleading. One might suggest that these
1553
+ images are maintained by the original software's developers, but it's not
1554
+ always the case. For example, the PHP Docker image
1555
+ # cite (<dockerhubphpimage> ,form :" normal" ) is not maintained by the core PHP
1556
+ development team. This means updates or fixes may not be as prompt or
1557
+ specific as if the software’s developers maintained the image.
1558
+
1559
+ While Docker vets these images for quality, responsibility for the contents
1560
+ rests with the maintainers. Users should be aware that official images are not
1561
+ immune to security risks or outdated software, and reviewing the documentation
1562
+ for issues is advisable.
1563
+
1564
+ In summary, Docker "official" images are trusted but may not be maintained by
1565
+ the original software’s maintainers. Developers must use them with caution and
1566
+ full awareness, particularly in production environments, and ensure that the
1567
+ images meet their security and functionality requirements.
1568
+ ]
1569
+
1570
+ Package managers are a critical aspect of the reproducibility puzzle since they
1571
+ can manage the state of a computational environment. Without proper control over
1572
+ how software and their dependencies are resolved and installed, achieving
1573
+ consistent and reproducible builds becomes difficult.
1574
+
1465
1575
=== Sources Of Non-Determinism
1466
1576
1467
1577
In this section we will explore the sources of non-determinism in software
0 commit comments