You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _sources/compiler-books.rst.txt
+59-19Lines changed: 59 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -15,30 +15,50 @@ I have 3 editions of these.
15
15
These books are criticised today because of the excessive focus on lexical analysis and parsing techniques.
16
16
While this is true, they do cover various aspects of a compiler backend such as intermediate representations and
17
17
optimization techniques including peephole optimization, data flow analysis, register allocation etc.
18
-
I found the description of the lattice in a data flow analysis quite accessible.
18
+
I found the description of the lattice in a data flow analysis quite accessible.
19
19
20
20
The 2nd edition adopts a more mathematical presentation style, whereas the earlier editions present
21
21
algorithms using pseudo code. I think the 1986 edition is the best.
22
22
23
+
The dragon books are a bit dated in that newer techniques such as Static Single Assignment or Graph
24
+
Coloring Register Allocation etc. are not covered in any detail. I would even say that these books are not
25
+
useful if your goal is to work with SSA IR.
26
+
23
27
For a different take on 2nd edition see `Review of the second addition of the "Dragon Book" <https://gcc.gnu.org/wiki/Review_of_the_second_addition_of_the_Dragon_Book.>`_.
24
28
25
29
Engineering a Compiler, 2nd Ed. Cooper & Torczon. 2012.
This intermediate representation uses named slots called virtual registers in the instruction when referencing
113
+
values. Lets look at the same example we saw above::
114
+
115
+
func foo(n: Int)->Int {
116
+
return n+1;
117
+
}
118
+
119
+
Produces::
120
+
121
+
L0:
122
+
%t1 = n+1
123
+
ret %t1
124
+
goto L1
125
+
L1:
126
+
127
+
The instructions above are as follows:
128
+
129
+
* ``%t1 = n+1`` - is a typical three-address instruction of the form ``result = value1 operator value2``. The name ``%t1``
130
+
refers to a temporary, whereas ``n`` refers to the input argument ``n``.
131
+
* ``ret %t1`` - is the return instruction, in this instance it references the temporary.
132
+
133
+
The virtual registers in the IR are so called because they do not map to real registers in the target physical machine.
134
+
Instead these are just named slots in the abstract machine responsible for executing the IR. Typically, the abstract machine
135
+
will assign each virtual register a unique location in its stack frame. So we still end up using the function's
136
+
stack frame, but the IR references locations within the stack frame via these virtual names, rather than implicitly
137
+
through push and pop instructions. During optimization some of the virtual registers will end up in real hardware registers.
138
+
139
+
Control flow is represented the same way as for the stack IR. Revisiting the same source example from above, we get following
140
+
IR::
141
+
142
+
L0:
143
+
%t0 = 1==1
144
+
if %t0 goto L2 else goto L3
145
+
L2:
146
+
%t0 = 2==2
147
+
goto L4
148
+
L3:
149
+
%t0 = 0
150
+
goto L4
151
+
L4:
152
+
ret %t0
153
+
goto L1
154
+
L1:
155
+
156
+
157
+
Advantages
158
+
----------
159
+
* Readability: the flow of values is easier to trace, whereas with a stack IR you need to conceptualize a stack somewhere,
160
+
and track values being pushed and popped.
161
+
* The IR can be executed easily by an Interpreter.
162
+
* Most optimization algorithms can be applied to this form of IR.
163
+
* The IR can represent Static Single Assignment (SSA) in a natural way.
164
+
165
+
Disadvantages
166
+
-------------
167
+
* Each instruction has operands, hence representing the IR in serialized form takes more space.
168
+
* Harder to generate the IR during compilation.
169
+
170
+
Examples
171
+
--------
172
+
* `Example basic register IR in EeZee Programming Language <https://github.com/CompilerProgramming/ez-lang/tree/main/registervm>`_.
173
+
* `Example register IR including SSA form and optimizations in EeZee Programming Language <https://github.com/CompilerProgramming/ez-lang/tree/main/optvm>`_.
174
+
* `LLVM instruction set <https://llvm.org/docs/LangRef.html#instruction-reference>`_.
175
+
* `Android Dalvik IR <https://source.android.com/docs/core/runtime/dex-format>`_.
176
+
177
+
Sea of Nodes IR
178
+
===============
179
+
The final example we will look at is known as the Sea of Nodes IR.
180
+
181
+
It is quite different from the IRs we described above.
182
+
183
+
The key features of this IR are:
184
+
185
+
* Instructions are NOT organized into Basic Blocks - instead, intructions form a graph, where
186
+
each instruction has as its inputs the definitions it uses.
187
+
* Instructions that produce data values are not directly bound to a Basic Block, instead they "float" around,
188
+
the order being defined purely in terms of the dependencies between the instructions.
189
+
* Control flow is also represented in the same way, and control flows between control flow
190
+
instructions. Dependencies between data instructions and control intructions occur at few well
191
+
defined places.
192
+
* The IR as described above cannot be readily executed, because to execute the IR, the instructions
193
+
must be scheduled; you can think of this as a process that puts the instructions into a traditional
194
+
Basic Block IR as described earlier.
195
+
196
+
Describing Sea of Nodes IR is quite involved. For now, I direct you to the `Simple project <https://github.com/SeaOfNodes/Simple/tree/main>`_; this
197
+
is an ongoing effort to explain the Sea of Nodes IR representation and how to implement it.
198
+
199
+
Beyond how the IR is represented, the main benefits of the Sea of Nodes IR are that:
200
+
201
+
* It is an SSA IR
202
+
* Various optimizations such as peephole optimizations, value numbering and common subexpressions elimination,
203
+
dead code elimitation, occur as the IR is built.
204
+
* This makes the SoN IR suitable for quick optimizations, suitable for Just-In-Time (JIT) compilers.
0 commit comments