@@ -1450,6 +1450,113 @@ and at any point in the past or future.
1450
1450
environments or machines.
1451
1451
]
1452
1452
1453
+ === Environments <ch2-environments>
1454
+
1455
+ Environments where a build or computational process occurs can be broadly
1456
+ categorised into two types: hardware and software environments. While software
1457
+ environments can be managed to a high degree of consistency, achieving
1458
+ reproducibility across different hardware, particularly different # gls (" CPU" )
1459
+ architectures # eg [`x86` , `ARM` ], is essentially impossible. Tasks like
1460
+ instruction execution, memory management, and floating-point calculations are
1461
+ handled in distinct ways. Even small variations in these processes can lead to
1462
+ differences in output. Consequently, even with identical software, builds on
1463
+ different types of # gls (" CPU" ) architectures will produce different results.
1464
+ When something is said to be reproducible, it typically means reproducible
1465
+ within the same # gls (" CPU" ) architecture. Therefore, this section will focus
1466
+ exclusively on the reproducibility challenges within software environments.
1467
+
1468
+ A software environment is composed of the # gls (" OS" ), along with the set of
1469
+ tools, libraries, and dependencies required to build or run a specific
1470
+ application. Any change in these components can influence the outcome of a
1471
+ software build or execution. For example, a minor update to a library could
1472
+ potentially alter the behaviour of the software, producing different outcomes
1473
+ across different executions or more importantly, have an impact on the security
1474
+ level.
1475
+
1476
+ To enhance reproducibility, it is critical to ensure that the software
1477
+ environment remains stable and unaltered during both the build and execution
1478
+ phases. Unfortunately, conventional # glspl (" OS" ) such as Linux distributions,
1479
+ Microsoft Windows, and macOS, are *mutable* by default. This mutability is
1480
+ primarily facilitated through package managers, which enable users to easily
1481
+ modify their environments by installing or upgrading software packages. As a
1482
+ result, uncontrolled changes to dependencies may also lead to inconsistencies in
1483
+ software behaviour, or have a impact on the security level, undermining
1484
+ reproducibility.
1485
+
1486
+ To mitigate these issues, *immutable* environments have gained popularity.
1487
+ Tools such as Docker provide mechanisms to encapsulate software and its
1488
+ dependencies in containers, thus creating environments that remain unchanged
1489
+ after creation. Once a container is built, it can be shared and executed across
1490
+ different systems with the guarantee that it will function identically, given
1491
+ the same environment. This characteristic makes containers highly suitable for
1492
+ distributing software.
1493
+
1494
+ Despite the advantages of immutability, it does not guarantee reproducibility by
1495
+ default. For instance, container images hosted on platforms like Docker Hub
1496
+ # cite (<dockerhub> ,form :" normal" ), including popular language interpreters
1497
+ # eg [Python, NodeJS, PHP], may not be reproducible due to non-deterministic
1498
+ steps during the image creation. A specific example can be found in
1499
+ # ref (<python-dockerfile> ), which runs `apt-get update` at line 4 as part of the
1500
+ image build process. Since `apt-get` pulls the latest version of package lists
1501
+ at build-time, it is impossible to reproduce the same image later, compromising
1502
+ Docker's build-time reproducibility.
1503
+
1504
+ # figure (
1505
+ sourcefile (
1506
+ lang : " dockerfile" ,
1507
+ read (" ../../resources/sourcecode/python.dockerfile" ),
1508
+ ),
1509
+ caption : [
1510
+ An excerpt of the Python's Dockerfile
1511
+ # cite (<python-dockerfile-repository> ,form :" normal" ) used to build the
1512
+ # emph [official] Python images.
1513
+ ],
1514
+ ) <python-dockerfile>
1515
+
1516
+ Docker images, once built, are immutable. While Docker does not guarantee
1517
+ build-time reproducibility, it has the potential to ensure run-time
1518
+ reproducibility, reflecting Docker's philosophy of
1519
+ # emph ["build once, use everywhere"]. This distinction between build-time
1520
+ reproducibility (@def-reproducibility-build-time ) and run-time reproducibility
1521
+ (@def-reproducibility-run-time ) is key. Docker does not ensure that an image
1522
+ will always be built consistently, often due to the base image used (as
1523
+ specified in the `FROM` directive of a `Dockerfile` ), as seen in
1524
+ @python-dockerfile . Although building a reproducible image with Docker is
1525
+ technically possible, it would require additional effort, external tools, and a
1526
+ more complex setup. Therefore, we assume that build-time reproducibility is not
1527
+ guaranteed, but the immutability of the environment significantly enhances the
1528
+ potential for reproducibility at run-time.
1529
+
1530
+ # info-box [
1531
+ Docker is a platform for building, shipping, and running applications in
1532
+ containers, with Docker Hub # cite (<dockerhub> ,form :" normal" ) providing a large
1533
+ repository of container images, which has significantly contributed to
1534
+ Docker's popularity. Among these are "official" Docker images
1535
+ # cite (<dockerofficialimages> ,form :" normal" ), which are curated and reviewed by
1536
+ Docker Inc. These images offer standard environments for popular software and
1537
+ adhere to some quality standards.
1538
+
1539
+ However, the term "official" can be misleading. One might suggest that these
1540
+ images are maintained by the original software's developers, but it's not
1541
+ always the case. For example, the PHP Docker image is not maintained by the
1542
+ core PHP development team. This means updates or fixes may not be as prompt or
1543
+ specific as if the software’s developers maintained the image.
1544
+
1545
+ While Docker vets these images for quality, responsibility for the contents
1546
+ rests with the maintainers. Users should be aware that official images are not
1547
+ immune to security risks or outdated software, and reviewing the documentation
1548
+ for issues is advisable.
1549
+
1550
+ In summary, "official" Docker images are trusted but may not be maintained by
1551
+ the software’s creators. Developers should use them with care, especially in
1552
+ production environments, and verify that the images meet their security and
1553
+ functionality needs.
1554
+ ]
1555
+
1556
+ Package managers are a critical aspect of the reproducibility puzzle. Without
1557
+ proper control over how dependencies are resolved and installed, achieving
1558
+ consistent and reproducible builds becomes difficult.
1559
+
1453
1560
=== Sources Of Non-Determinism
1454
1561
1455
1562
In this section we will explore the sources of non-determinism in software
0 commit comments