@@ -1449,6 +1449,116 @@ and at any point in the past or future.
1449
1449
environments or machines.
1450
1450
]
1451
1451
1452
+ === Computational Environments <ch2-environments>
1453
+
1454
+ Environments where a build or computational process occurs can be broadly
1455
+ categorised into two types: hardware and software environments
1456
+ # cite (<strangfeld_2024> ,form :" normal" , supplement : " p. 8, section 2.1" ). While
1457
+ software environments can be managed to a high degree of consistency, achieving
1458
+ reproducibility across different hardware, particularly different # gls (" CPU" )
1459
+ architectures # eg [`x86` , `ARM` ], is essentially impossible. Tasks like
1460
+ instruction execution, memory management, and floating-point calculations are
1461
+ handled in distinct ways. Even small variations in these processes can lead to
1462
+ differences in output. Consequently, even with identical software, builds on
1463
+ different types of # gls (" CPU" ) architectures will produce different results.
1464
+ When something is said to be reproducible, it typically means reproducible
1465
+ within the same # gls (" CPU" ) architecture. Therefore, this section will focus
1466
+ exclusively on the reproducibility challenges within software environments.
1467
+
1468
+ A software environment is composed of the # gls (" OS" ), along with the set of
1469
+ tools, libraries, and dependencies required to build or run a specific
1470
+ application. Any change in these components can influence the outcome of a
1471
+ software build or execution. For example, a minor update to a library could
1472
+ potentially alter the behaviour of the software, producing different outcomes
1473
+ across different executions or more importantly, have an impact on the security
1474
+ level.
1475
+
1476
+ To enhance reproducibility, it is critical to ensure that the software
1477
+ environment remains stable and unaltered during both the build and execution
1478
+ phases. Unfortunately, conventional # glspl (" OS" ) such as Linux distributions,
1479
+ Microsoft Windows, and macOS, are # emph [mutable] by default. This mutability is
1480
+ primarily facilitated through package managers, which enable users to easily
1481
+ modify their environments by installing or upgrading software packages. As a
1482
+ result, uncontrolled changes to dependencies may also lead to inconsistencies in
1483
+ software behaviour, or have a impact on the security level, undermining
1484
+ reproducibility.
1485
+
1486
+ To mitigate these issues, # emph [immutable] environments have gained popularity.
1487
+ Tools such as Docker # cite (<docker> ,form :" normal" ) provide mechanisms to
1488
+ encapsulate software and their dependencies in containers, thus creating
1489
+ environments that remain unchanged after creation. Once a container is built, it
1490
+ can be shared and executed across different systems with the guarantee that it
1491
+ will function identically, given the same environment. This characteristic makes
1492
+ containers highly suitable for distributing software.
1493
+
1494
+ Despite the advantages of immutability, it does not guarantee reproducibility.
1495
+ For instance, container images hosted on platforms like Docker Hub
1496
+ # cite (<dockerhub> ,form :" normal" ), including popular language interpreters
1497
+ # eg [Python, NodeJS, PHP], may not be reproducible due to non-deterministic
1498
+ steps during the image creation (at build-time). A specific example can be found
1499
+ in # ref (<python-dockerfile> ), which runs `apt-get update` at line 4 as part of
1500
+ the image build process. Since `apt-get` pulls the very latest version of
1501
+ package index during its creation, it is impossible to build again the same
1502
+ image later, compromising Docker's build-time reproducibility.
1503
+
1504
+ # figure (
1505
+ sourcefile (
1506
+ lang : " dockerfile" ,
1507
+ read (" ../../resources/sourcecode/python.dockerfile" ),
1508
+ ),
1509
+ caption : [
1510
+ An excerpt of the Python's Dockerfile
1511
+ # cite (<python-dockerfile-repository> ,form :" normal" ) used to build the
1512
+ # emph [official] Python images.
1513
+ ],
1514
+ ) <python-dockerfile>
1515
+
1516
+ Docker images, once built, are immutable. While Docker does not guarantee
1517
+ build-time reproducibility, it has the potential to ensure run-time
1518
+ reproducibility, reflecting Docker's philosophy of
1519
+ # emph ["build once, use everywhere"]. This distinction between build-time
1520
+ reproducibility (@def-reproducibility-build-time ) and run-time reproducibility
1521
+ (@def-reproducibility-run-time ) is key. Docker does not ensure that an image
1522
+ will always be built consistently, often due to the base image used (as
1523
+ declared in the `FROM` directive of a `Dockerfile` ), as seen in
1524
+ @python-dockerfile . Although building a reproducible image with Docker is
1525
+ technically possible, it would require additional effort, external tools, and a
1526
+ more complex setup. Therefore, we assume that build-time reproducibility is not
1527
+ guaranteed, but the immutability of the environment significantly enhances the
1528
+ potential for reproducibility at run-time.
1529
+
1530
+ # info-box (kind : " important" )[
1531
+ Docker is a platform for building, shipping, and running applications in
1532
+ containers, with Docker Hub # cite (<dockerhub> ,form :" normal" ) providing a large
1533
+ repository of container images, which has significantly contributed to
1534
+ Docker's popularity. Among these are the # emph [Docker "official" images]
1535
+ # cite (<dockerofficialimages> ,form :" normal" ), which are curated and reviewed by
1536
+ the Docker community. These images offer standard environments for popular
1537
+ software and adhere to some quality standards.
1538
+
1539
+ However, the term "official" can be misleading. One might suggest that these
1540
+ images are maintained by the original software's developers, but it's not
1541
+ always the case. For example, the PHP Docker image
1542
+ # cite (<dockerhubphpimage> ,form :" normal" ) is not maintained by the core PHP
1543
+ development team. This means updates or fixes may not be as prompt or
1544
+ specific as if the software’s developers maintained the image.
1545
+
1546
+ While Docker vets these images for quality, responsibility for the contents
1547
+ rests with the maintainers. Users should be aware that official images are not
1548
+ immune to security risks or outdated software, and reviewing the documentation
1549
+ for issues is advisable.
1550
+
1551
+ In summary, Docker "official" images are trusted but may not be maintained by
1552
+ the original software’s maintainers. Developers must use them with caution and
1553
+ full awareness, particularly in production environments, and ensure that the
1554
+ images meet their security and functionality requirements.
1555
+ ]
1556
+
1557
+ Package managers are a critical aspect of the reproducibility puzzle since they
1558
+ can manage the state of a computational environment. Without proper control over
1559
+ how software and their dependencies are resolved and installed, achieving
1560
+ consistent and reproducible builds becomes difficult.
1561
+
1452
1562
=== Sources Of Non-Determinism
1453
1563
1454
1564
In this section we will explore the sources of non-determinism in software
0 commit comments