@@ -1707,6 +1707,163 @@ as `-u`, and the `LC_ALL` environment variable to the `date` command. This
1707
1707
approach ensures that the output we receive is predictable and consistent,
1708
1708
regardless of the underlying system configuration.
1709
1709
1710
+ ==== Environments and Configuration Management
1711
+
1712
+ In the context of # gls (" SE" ), reproducibility not only relies on stable
1713
+ codebases but also heavily depends on consistent and well-maintained
1714
+ environments. Configuration management plays a critical role in ensuring
1715
+ reproducibility by mitigating the non-deterministic behaviours introduced by
1716
+ configuration drift.
1717
+
1718
+ # info-box [
1719
+ Configuration drift occurs when changes to an environment
1720
+ accumulate over time, leading to variations that deviate from the desired or
1721
+ initial configuration state, thus introducing non-determinism.
1722
+ ]
1723
+
1724
+ This section examines key configuration management models,
1725
+ their impact on reproducibility, and the tools that enforce these principles in
1726
+ modern software environments.
1727
+
1728
+ Another source of non-determinism arises from inconsistent environment
1729
+ configurations. The way environments are managed directly affects the
1730
+ environment behaviours and inherently, reproducibility. Therefore, configuration
1731
+ management plays an important role in mitigating non-determinism by ensuring
1732
+ that systems, software installations and software builds remain consistent
1733
+ across different environments.
1734
+
1735
+ @Traugott2002 classify environment configuration management into three
1736
+ categories, each of which has a distinct impact on the level of determinism
1737
+ achieved:
1738
+
1739
+ # figure (include " ../../resources/typst/configuration-management.typ" )
1740
+
1741
+ ===== Divergent Configuration Management
1742
+
1743
+ In this model (@divergent-config-management ), environments are typically managed
1744
+ by one or more individuals, which inevitably leads to
1745
+ # emph [configuration drift], where the configurations of different systems
1746
+ deviate over time. This is an unavoidable process when system modifications are
1747
+ performed without centralised control, leading to unpredictable and
1748
+ non-deterministic behaviour, making reproducibility almost impossible in complex
1749
+ infrastructures. Reducing reliance on manual adjustments is essential to
1750
+ achieving higher levels of system predictability and reproducibility.
1751
+ A common example of this model is a newly installed operating system that
1752
+ initially shares a uniform configuration. Over time, as users customise their
1753
+ environments to suit individual preferences, the system’s state diverges from
1754
+ its original, well-defined configuration.
1755
+
1756
+ ===== Convergent Configuration Management
1757
+
1758
+ Once configuration drift is identified as an issue, the focus shifts towards
1759
+ convergence, bringing systems back to a known and consistent state, as
1760
+ illustrated in @convergent-config-management . While efforts are made to
1761
+ standardise configurations, achieving exact uniformity is extremely challenging,
1762
+ if not impossible. Systems may progressively "converge" towards a common
1763
+ configuration, but subtle differences can persist, introducing variability. The
1764
+ goal in this model is to minimise these variations as much as possible, though
1765
+ complete uniformity is rarely attained. To illustrate this model, we could think
1766
+ of an arbitrary environment that needs to be configured in a specific way, reach
1767
+ a particular well known state. For example, some specific dependencies has to be
1768
+ installed. Tools like Puppet # cite (<puppet> , form : " normal" ),
1769
+ Kubernetes # cite (<kubernetes> ,form : " normal" ),
1770
+ Terraform # cite (<terraform> ,form : " normal" ),
1771
+ Ansible # cite (<ansible> , form : " normal" ).
1772
+ While convergent management offers flexibility in responding to unforeseen
1773
+ changes in the environment, it is prone to feedback loops that may cause
1774
+ unexpected behaviour. Such feedback loops make it difficult to achieve complete
1775
+ reproducibility, as the system's progression towards the desired state is not
1776
+ guaranteed to follow a deterministic path.
1777
+
1778
+ ===== Congruent Configuration Management
1779
+
1780
+ This approach in @congruent-config-management enforces strict consistency across
1781
+ all environments, ensuring that each environment maintains an identical
1782
+ configuration. By preventing configuration drift from the outset, congruent
1783
+ configuration management aims to eliminate one of the key sources of
1784
+ non-determinism. Maintaining identical setups across environments is a central
1785
+ goal of this model, providing the highest level of determinism and reliability
1786
+ in system behaviours. To illustrate this model, we could think of an arbitrary
1787
+ environment that needs to be configured in a specific way.
1788
+
1789
+ Congruent management, particularly through the adoption of immutable
1790
+ environment, ensures that systems remain in a well-defined state, thus
1791
+ maximising reproducibility. However, this approach can lack the flexibility
1792
+ required for dynamic environments, where each minor adjustments may necessitate
1793
+ rebuilding the entire system. This limitation highlights the importance of
1794
+ carefully choosing between convergent and congruent approaches based on the
1795
+ system's needs.
1796
+
1797
+ Tools such as Nix or Guix have demonstrated that it is possible to achieve a
1798
+ high degree of congruence while allowing controlled divergence in specific areas
1799
+ such as databases or secret management. This balance between convergence and
1800
+ congruence highlights the flexibility required to maintain reproducibility in
1801
+ environments that manage both static system components and dynamic data.
1802
+
1803
+ On top of specifying configuration management models, we can also distinguish
1804
+ two different configuration management paradigms.
1805
+
1806
+ ===== Imperative Configuration Management
1807
+
1808
+ This paradigm specifies the exact steps required to transition an environment
1809
+ from its current state to the desired state. Tools such as
1810
+ Ansible # cite (<ansible> , form : " normal" ), Chef # cite (<chef> , form : " normal" ),
1811
+ Docker # cite (<docker> , form : " normal" ), and shell scripts exemplify this
1812
+ methodology. While imperative configurations enable the use of complex logic and
1813
+ conditional operations, they can be challenging to maintain due to their
1814
+ non-idempotent nature, meaning the same script may yield different results
1815
+ depending on the environment's initial state. This approach requires careful
1816
+ management to ensure consistency and repeatability, providing detailed control
1817
+ at the expense of simplicity and predictability.
1818
+
1819
+ The expressiveness of imperative tools allows for stronger assumptions about the
1820
+ environment's current state, which increases the likelihood of configuration
1821
+ drift as environments diverge over time. To achieve consistency in an imperative
1822
+ paradigm, it often necessitates extensive error handling, validation checks, and
1823
+ retries, ensuring that despite the stepwise nature of the process, the system
1824
+ reaches a stable end state.
1825
+
1826
+ ===== Declarative Configuration Management
1827
+
1828
+ Declarative configuration management ensure idempotence, meaning the same
1829
+ configuration can be applied multiple times without altering the environment
1830
+ beyond its intended state. This abstraction simplifies understanding and
1831
+ maintenance by allowing the system to determine the necessary actions to achieve
1832
+ the desired state. Tools such as Puppet # cite (<puppet> , form : " normal" ),
1833
+ Kubernetes # cite (<kubernetes> ,form : " normal" ),
1834
+ Terraform # cite (<terraform> ,form : " normal" ) and, under some conditions,
1835
+ Docker # cite (<docker> , form : " normal" ) are used to specify the desired end
1836
+ state. These tools typically feature their own specific # gls (" DSL" ) to create
1837
+ high-level descriptions of the desired environment's state, as opposed to
1838
+ issuing imperative and procedural commands. The declarative approach mitigates
1839
+ the risk of configuration drift by prioritising idempotence, maintaining
1840
+ explicit dependency graphs, and ensuring a strong awareness of the current state
1841
+ of the environment # cite (<HunterGCP> ,form :" normal" , supplement : [p. 348]).
1842
+
1843
+ While most configuration systems aim to be declarative to ensure reproducibility
1844
+ and idempotency, some imperative tools can achieve a level of congruence.
1845
+ However, this often comes at the cost of predictability and ease of maintenance,
1846
+ making them less favourable in environments where stability and simplicity are
1847
+ prioritised.
1848
+
1849
+ # figure (
1850
+ include " ../../resources/typst/configuration-management-summary.typ" ,
1851
+ caption : [Configuration Management Models and Paradigms],
1852
+ kind : " table" ,
1853
+ supplement : [Table],
1854
+ ) <ch2-table-configuration-mgmt>
1855
+
1856
+ # info-box (kind : " note" )[
1857
+ In @ch2-table-configuration-mgmt , Docker is classified as both declarative and
1858
+ imperative. This dual classification arises from the fact that while Docker
1859
+ often start with declarative configurations (e.g., a `Dockerfile` ), it can
1860
+ shift towards an imperative approach when imperative commands are introduced
1861
+ within the `Dockerfile` to achieve the desired state. As a result, the same
1862
+ `Dockerfile` may produce different outcomes depending on the base image in
1863
+ use, leading to non-idempotent behaviour and ultimately hindering
1864
+ reproducibility.
1865
+ ]
1866
+
1710
1867
=== Comparing Builds
1711
1868
1712
1869
In the quest for software reproducibility, identifying and understanding the
0 commit comments