Skip to content

Commit a8d0ab6

Browse files
committed
chapter 2: add Environments and Configuration Management
1 parent 4b94f6f commit a8d0ab6

File tree

3 files changed

+329
-0
lines changed

3 files changed

+329
-0
lines changed
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#import "../../src/thesis/imports/preamble.typ": *
2+
3+
4+
#table(
5+
columns: (1fr, 1fr, 1fr),
6+
stroke: none,
7+
table.header(
8+
[],
9+
table.vline(stroke: 1pt),
10+
[#align(center)[Imperative]],
11+
table.vline(stroke: .5pt),
12+
[#align(center)[Declarative]],
13+
table.hline(stroke: 1pt),
14+
),
15+
table.cell(align: horizon + center)[Divergent],
16+
[
17+
- Shell commands
18+
],
19+
[
20+
- Shell scripts
21+
- Ansible
22+
],
23+
table.hline(stroke: .5pt),
24+
table.cell(align: horizon + center, rowspan: 2)[Convergent],
25+
table.cell(colspan: 2)[
26+
- Docker
27+
],
28+
table.hline(stroke: .5pt + luma(200), start: 1),
29+
[
30+
- Ansible
31+
- Chef
32+
- Shell scripts
33+
],
34+
[
35+
- Puppet
36+
- Kubernetes
37+
- Terraform
38+
],
39+
table.hline(stroke: .5pt),
40+
table.cell(align: horizon + center)[Congruent],
41+
[],
42+
[
43+
- Nix
44+
- Guix
45+
],
46+
)
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
#import "../../src/thesis/imports/preamble.typ": *
2+
3+
#grid(
4+
columns: (1fr, 1fr, 1fr),
5+
gutter: 1em,
6+
[
7+
#set align(bottom)
8+
#figure(
9+
{
10+
set text(font: "Virgil 3 YOFF")
11+
cetz.canvas({
12+
import cetz.plot
13+
import cetz.draw: content
14+
15+
plot.plot(
16+
size: (3.5, 3),
17+
y-label: [state],
18+
x-label: [Time],
19+
axis-style: "school-book",
20+
x-tick-step: none,
21+
y-tick-step: none,
22+
x-min: 0,
23+
x-max: 500,
24+
x-grid: true,
25+
y-min: 0,
26+
y-max: 500,
27+
legend: "legend.north",
28+
{
29+
plot.add(
30+
((75, 75), (450, 300)),
31+
mark: "o",
32+
)
33+
plot.add(
34+
((75, 50), (450, 125)),
35+
mark: "o",
36+
style: (stroke: (paint: red, dash: "dashed")),
37+
)
38+
},
39+
)
40+
})
41+
},
42+
caption: [Divergent],
43+
) <divergent-config-management>
44+
],
45+
[
46+
#set align(bottom)
47+
#figure(
48+
{
49+
set text(font: "Virgil 3 YOFF")
50+
cetz.canvas({
51+
import cetz.plot
52+
import cetz.draw: *
53+
54+
plot.plot(
55+
size: (3.5, 3),
56+
y-label: [State],
57+
x-label: [Time],
58+
axis-style: "school-book",
59+
x-tick-step: none,
60+
y-tick-step: none,
61+
x-min: 0,
62+
x-max: 500,
63+
x-grid: true,
64+
y-min: 0,
65+
y-max: 500,
66+
legend: "legend.north",
67+
{
68+
plot.add(
69+
((75, 75), (450, 300)),
70+
style: (stroke: (paint: blue)),
71+
mark: "o",
72+
label: "actual",
73+
)
74+
plot.add(
75+
((75, 500), (450, 325)),
76+
mark: "o",
77+
label: "target",
78+
style: (stroke: (paint: red, dash: "dashed")),
79+
)
80+
},
81+
)
82+
})
83+
},
84+
caption: [Convergent],
85+
) <convergent-config-management>
86+
],
87+
[
88+
#set align(bottom)
89+
#figure(
90+
{
91+
set text(font: "Virgil 3 YOFF")
92+
cetz.canvas({
93+
import cetz.plot
94+
import cetz.draw: *
95+
96+
plot.plot(
97+
size: (3.5, 3),
98+
y-label: [State],
99+
x-label: [Time],
100+
axis-style: "school-book",
101+
x-tick-step: none,
102+
y-tick-step: none,
103+
x-min: 0,
104+
x-max: 500,
105+
x-grid: true,
106+
y-min: 0,
107+
y-max: 500,
108+
legend: "legend.inner-south-east",
109+
{
110+
plot.add(
111+
((75, 75), (450, 300)),
112+
mark: "o",
113+
)
114+
plot.add(
115+
((75, 50), (450, 275)),
116+
mark: "o",
117+
style: (stroke: (paint: red, dash: "dashed")),
118+
)
119+
},
120+
)
121+
})
122+
},
123+
caption: [Congruent],
124+
) <congruent-config-management>
125+
],
126+
)

src/thesis/2-reproducibility.typ

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1707,6 +1707,163 @@ as `-u`, and the `LC_ALL` environment variable to the `date` command. This
17071707
approach ensures that the output we receive is predictable and consistent,
17081708
regardless of the underlying system configuration.
17091709

1710+
==== Environments and Configuration Management
1711+
1712+
In the context of #gls("SE"), reproducibility not only relies on stable
1713+
codebases but also heavily depends on consistent and well-maintained
1714+
environments. Configuration management plays a critical role in ensuring
1715+
reproducibility by mitigating the non-deterministic behaviours introduced by
1716+
configuration drift.
1717+
1718+
#info-box[
1719+
Configuration drift occurs when changes to an environment
1720+
accumulate over time, leading to variations that deviate from the desired or
1721+
initial configuration state, thus introducing non-determinism.
1722+
]
1723+
1724+
This section examines key configuration management models,
1725+
their impact on reproducibility, and the tools that enforce these principles in
1726+
modern software environments.
1727+
1728+
Another source of non-determinism arises from inconsistent environment
1729+
configurations. The way environments are managed directly affects the
1730+
environment behaviours and inherently, reproducibility. Therefore, configuration
1731+
management plays an important role in mitigating non-determinism by ensuring
1732+
that systems, software installations and software builds remain consistent
1733+
across different environments.
1734+
1735+
@Traugott2002 classify environment configuration management into three
1736+
categories, each of which has a distinct impact on the level of determinism
1737+
achieved:
1738+
1739+
#figure(include "../../resources/typst/configuration-management.typ")
1740+
1741+
===== Divergent Configuration Management
1742+
1743+
In this model (@divergent-config-management), environments are typically managed
1744+
by one or more individuals, which inevitably leads to
1745+
#emph[configuration drift], where the configurations of different systems
1746+
deviate over time. This is an unavoidable process when system modifications are
1747+
performed without centralised control, leading to unpredictable and
1748+
non-deterministic behaviour, making reproducibility almost impossible in complex
1749+
infrastructures. Reducing reliance on manual adjustments is essential to
1750+
achieving higher levels of system predictability and reproducibility.
1751+
A common example of this model is a newly installed operating system that
1752+
initially shares a uniform configuration. Over time, as users customise their
1753+
environments to suit individual preferences, the system’s state diverges from
1754+
its original, well-defined configuration.
1755+
1756+
===== Convergent Configuration Management
1757+
1758+
Once configuration drift is identified as an issue, the focus shifts towards
1759+
convergence, bringing systems back to a known and consistent state, as
1760+
illustrated in @convergent-config-management. While efforts are made to
1761+
standardise configurations, achieving exact uniformity is extremely challenging,
1762+
if not impossible. Systems may progressively "converge" towards a common
1763+
configuration, but subtle differences can persist, introducing variability. The
1764+
goal in this model is to minimise these variations as much as possible, though
1765+
complete uniformity is rarely attained. To illustrate this model, we could think
1766+
of an arbitrary environment that needs to be configured in a specific way, reach
1767+
a particular well known state. For example, some specific dependencies has to be
1768+
installed. Tools like Puppet #cite(<puppet>, form: "normal"),
1769+
Kubernetes #cite(<kubernetes>,form: "normal"),
1770+
Terraform #cite(<terraform>,form: "normal"),
1771+
Ansible #cite(<ansible>, form: "normal").
1772+
While convergent management offers flexibility in responding to unforeseen
1773+
changes in the environment, it is prone to feedback loops that may cause
1774+
unexpected behaviour​. Such feedback loops make it difficult to achieve complete
1775+
reproducibility, as the system's progression towards the desired state is not
1776+
guaranteed to follow a deterministic path.
1777+
1778+
===== Congruent Configuration Management
1779+
1780+
This approach in @congruent-config-management enforces strict consistency across
1781+
all environments, ensuring that each environment maintains an identical
1782+
configuration. By preventing configuration drift from the outset, congruent
1783+
configuration management aims to eliminate one of the key sources of
1784+
non-determinism. Maintaining identical setups across environments is a central
1785+
goal of this model, providing the highest level of determinism and reliability
1786+
in system behaviours. To illustrate this model, we could think of an arbitrary
1787+
environment that needs to be configured in a specific way.
1788+
1789+
Congruent management, particularly through the adoption of immutable
1790+
environment, ensures that systems remain in a well-defined state, thus
1791+
maximising reproducibility. However, this approach can lack the flexibility
1792+
required for dynamic environments, where each minor adjustments may necessitate
1793+
rebuilding the entire system. This limitation highlights the importance of
1794+
carefully choosing between convergent and congruent approaches based on the
1795+
system's needs.
1796+
1797+
Tools such as Nix or Guix have demonstrated that it is possible to achieve a
1798+
high degree of congruence while allowing controlled divergence in specific areas
1799+
such as databases or secret management​. This balance between convergence and
1800+
congruence highlights the flexibility required to maintain reproducibility in
1801+
environments that manage both static system components and dynamic data.
1802+
1803+
On top of specifying configuration management models, we can also distinguish
1804+
two different configuration management paradigms.
1805+
1806+
===== Imperative Configuration Management
1807+
1808+
This paradigm specifies the exact steps required to transition an environment
1809+
from its current state to the desired state. Tools such as
1810+
Ansible #cite(<ansible>, form: "normal"), Chef #cite(<chef>, form: "normal"),
1811+
Docker #cite(<docker>, form: "normal"), and shell scripts exemplify this
1812+
methodology. While imperative configurations enable the use of complex logic and
1813+
conditional operations, they can be challenging to maintain due to their
1814+
non-idempotent nature, meaning the same script may yield different results
1815+
depending on the environment's initial state. This approach requires careful
1816+
management to ensure consistency and repeatability, providing detailed control
1817+
at the expense of simplicity and predictability.
1818+
1819+
The expressiveness of imperative tools allows for stronger assumptions about the
1820+
environment's current state, which increases the likelihood of configuration
1821+
drift as environments diverge over time. To achieve consistency in an imperative
1822+
paradigm, it often necessitates extensive error handling, validation checks, and
1823+
retries, ensuring that despite the stepwise nature of the process, the system
1824+
reaches a stable end state.
1825+
1826+
===== Declarative Configuration Management
1827+
1828+
Declarative configuration management ensure idempotence, meaning the same
1829+
configuration can be applied multiple times without altering the environment
1830+
beyond its intended state. This abstraction simplifies understanding and
1831+
maintenance by allowing the system to determine the necessary actions to achieve
1832+
the desired state. Tools such as Puppet #cite(<puppet>, form: "normal"),
1833+
Kubernetes #cite(<kubernetes>,form: "normal"),
1834+
Terraform #cite(<terraform>,form: "normal") and, under some conditions,
1835+
Docker #cite(<docker>, form: "normal") are used to specify the desired end
1836+
state. These tools typically feature their own specific #gls("DSL") to create
1837+
high-level descriptions of the desired environment's state, as opposed to
1838+
issuing imperative and procedural commands. The declarative approach mitigates
1839+
the risk of configuration drift by prioritising idempotence, maintaining
1840+
explicit dependency graphs, and ensuring a strong awareness of the current state
1841+
of the environment​​ #cite(<HunterGCP>,form:"normal", supplement: [p. 348]).
1842+
1843+
While most configuration systems aim to be declarative to ensure reproducibility
1844+
and idempotency, some imperative tools can achieve a level of congruence.
1845+
However, this often comes at the cost of predictability and ease of maintenance,
1846+
making them less favourable in environments where stability and simplicity are
1847+
prioritised.
1848+
1849+
#figure(
1850+
include "../../resources/typst/configuration-management-summary.typ",
1851+
caption: [Configuration Management Models and Paradigms],
1852+
kind: "table",
1853+
supplement: [Table],
1854+
) <ch2-table-configuration-mgmt>
1855+
1856+
#info-box(kind: "note")[
1857+
In @ch2-table-configuration-mgmt, Docker is classified as both declarative and
1858+
imperative. This dual classification arises from the fact that while Docker
1859+
often start with declarative configurations (e.g., a `Dockerfile`), it can
1860+
shift towards an imperative approach when imperative commands are introduced
1861+
within the `Dockerfile` to achieve the desired state. As a result, the same
1862+
`Dockerfile` may produce different outcomes depending on the base image in
1863+
use, leading to non-idempotent behaviour and ultimately hindering
1864+
reproducibility.
1865+
]
1866+
17101867
=== Comparing Builds
17111868

17121869
In the quest for software reproducibility, identifying and understanding the

0 commit comments

Comments
 (0)