@@ -22,24 +22,22 @@ \section{Introduction}
22
22
This document describes the syntax and semantics of BAP Instruction
23
23
Language. The language is used to represent a semantics of machine
24
24
instructions. Each machine instruction is represented by a BIL program
25
- that tries to capture all side effect of the instruction.
26
-
27
-
25
+ that captures side effect of the instruction.
28
26
29
27
\section {Syntax }
30
28
\label {sec:syntax }
31
29
32
30
\subsection {Metavariables }
33
31
\label {sec:meta }
34
32
35
- We define a small set of metavariables that are used to denote string
36
- and numeric literals and subscripts :
33
+ We define a small set of metavariables that are used to denote
34
+ subscripts, numerals and string literals :
37
35
38
36
\ottmetavars
39
37
40
38
\subsection {BIL syntax }
41
39
42
- BIL program is reperesented as a sequence of BIL statements. Each
40
+ BIL program is reperesented as a sequence of statements. Each
43
41
statement performs some side-effectful computation.
44
42
45
43
\ottgrammartabular {
@@ -57,11 +55,6 @@ \subsection{BIL syntax}
57
55
58
56
\ottgrammartabular {
59
57
\ottexp\ottinterrule
60
- \ottvar\ottinterrule
61
- \ottbop\ottinterrule
62
- \ottuop\ottinterrule
63
- \ottendian\ottinterrule
64
- \ottcast\ottinterrule
65
58
}
66
59
67
60
\ottgrammartabular {
@@ -92,10 +85,10 @@ \subsection{Bitvector syntax}
92
85
value and size. Operations \verb |ext | and \verb |exts | performs
93
86
extract/extend operation. The former is unsigned (i.e., it extends
94
87
with zeros), the latter is signed. This operation extracts bits from a
95
- bitvector starting from $ \mathit {hi}$ and ending $ \mathit {lo}$ bit
96
- (both ends including ). If $ \mathit {hi}$ is greater than the bitwidth
97
- of the bitvector, then it is extended with zeros (for \verb | ext |
98
- operation) or with a sign bit (for \verb |exts |) operation.
88
+ bitvector starting from $ \mathit {hi}$ and ending with $ \mathit {lo}$
89
+ bit (both ends included ). If $ \mathit {hi}$ is greater than the
90
+ bitwidth of the bitvector, then it is extended with zeros (for
91
+ \verb | ext | operation) or with a sign bit (for \verb |exts |) operation.
99
92
100
93
\ottgrammartabular {
101
94
\ottword\ottinterrule
@@ -108,8 +101,8 @@ \subsection{Value syntax}
108
101
expressions that are not reducible.
109
102
110
103
We have three kinds of values --- immediates, represented as
111
- bitvectors; unknown values and storages (aka memories) represented
112
- symbolically as a list of assignments:
104
+ bitvectors; unknown values and storages (memories in BIL parlance),
105
+ represented symbolically as a list of assignments:
113
106
114
107
\ottgrammartabular {
115
108
\ottval\ottinterrule
@@ -126,7 +119,7 @@ \subsection{Formula syntax}
126
119
$ \Delta $ context is represented as list of pairs. We also add a small
127
120
set of operations over natural numbers, like comparison and
128
121
arithmetics. Natural numbers are mostly used to reason about sizes of
129
- bitvectors, that why they are often referred as $ \mathit {sz}$ .
122
+ bitvectors, that's why they are often referred as $ \mathit {sz}$ .
130
123
131
124
We also add syntax for equality comparison for values and variables.
132
125
@@ -147,15 +140,16 @@ \subsection{Instruction syntax}
147
140
\label {sec:insn }
148
141
149
142
To reason about the whole program we introduce a syntax for
150
- instruction. An instruction is a binary string with length
151
- $ \mathit {w_2}$ , that was read by a decoder from an address
152
- $ \mathit {w_1}$ . The semantics of an instruction is described by a
153
- $ \mathit {bil}$ program.
143
+ instruction. An instruction is a binary sequence of $ \mathit {w_2}$
144
+ bytes, that was read by a decoder from an address $ \mathit {w_1}$ . The
145
+ semantics of an instruction is described by the $ \mathit {bil}$ program.
154
146
155
147
\ottgrammartabular {
156
148
\ottinsn\ottinterrule
157
149
}
158
150
151
+ \clearpage
152
+
159
153
\section {Typing }
160
154
\label {sec:typing }
161
155
@@ -167,24 +161,89 @@ \section{Typing}
167
161
168
162
\ottdefnstypingXXexp
169
163
164
+ \clearpage
165
+
170
166
171
167
\section {Operational semantics }
172
168
173
169
\subsection {Model of a program }
174
170
175
171
Program is coinductively defined as an infinite stream of program
176
172
states, produced by a step rule. Each state is represented with a
177
- triplet $ \Delta , w, var$ , where $ \Delta $ is a mapping from variables
173
+ triplet $ ( \Delta , w, var) $ , where $ \Delta $ is a mapping from variables
178
174
to values, $ w$ is a program counter, and $ var$ is a variable
179
175
denoting currently active memory.
180
176
181
177
The \verb |step | rule defines how a machine instruction is
182
- evaluated. We use `` magic'' function \verb |decode | that fetches
183
- instruction from memory and decodes it to a BIL program.
178
+ evaluated. We use `` magic'' rule \verb |decode | that fetches
179
+ instructions from the memory and decodes them to a BIL program.
184
180
185
- A program counter is updated after each instruction.
181
+ The BIL code is evaluated using reduction rules of statements (see
182
+ section \ref {sec:sema:stmt }). Then the program counter is updated with
183
+ the $ w_3 $ , that initially points to a byte following current instruction.
186
184
187
185
\ottdefnsprogram
188
186
187
+ \section {Semantics of statements }
188
+ \label {sec:sema:stmt }
189
+
190
+ The reduction rule defines transformation of a state for each
191
+ statement. The state of the reduction rule consists of a pair
192
+ $ (\Delta ,w)$ , where $ \Delta $ is a mapping from variables to values and
193
+ $ w$ is an address of a next instruction.
194
+
195
+ Two statements affect the state: \verb |Move | statement introduces new
196
+ $ var \leftarrow v$ binding in $ \Delta $ , and \verb |Jmp | affects
197
+ program counter.
198
+
199
+ The \verb |if | and \verb |while | instructions introduce local control
200
+ flow.
201
+
202
+ There is no special semantics associated with \verb |special | and
203
+ \verb |cpuexn | statements.
204
+
205
+ \ottdefnsreduceXXstmt
206
+
207
+
208
+ \section {Semantics of expressions}
209
+ \label {sec:sema:exp }
210
+
211
+ This section describes a small step operational semantics for
212
+ expressions. A symbolic formula $ \Delta \vdash e \rightarrow e' $
213
+ defines a step of transformation from expression $ e$ to an expression
214
+ $ e'$ under given context $ \Delta $ .
215
+
216
+ A well formed (well typed) expression evaluates to a value expression,
217
+ that is syntactic subset of expression grammar (see
218
+ section \ref {sec:values }).
219
+
220
+ A value can be either an immediate, represented by a bitvector, a
221
+ unknown value, or a memory storage.
222
+
223
+ A memory storage is represented symbolically as a sequence of
224
+ storages to the originally undefined memory. Each storage
225
+ operation of size greater than 8 bits is desugared into a sequence of
226
+ 8 bit storages in a big endian order.
227
+
228
+ A load operation will first reduce all sub expressions of a memory
229
+ object to values and then recursively destruct the object until one of
230
+ the following conditions is met:
231
+
232
+
233
+ \begin {description }
234
+ \item [load-byte:] if the memory object is a storage of a \verb |value |
235
+ to an immediate (known) address that we're trying to load then the
236
+ load expression is reduced to \verb |value |.
237
+ \item [load-un-memory:] if the memory object is an \verb |unknown | value,
238
+ then the load expression evaluates to \verb |unknown |.
239
+ \item [load-un-addr:] if the memory object is a storage to
240
+ \verb |unknown | value address then load expression evaluates to
241
+ \verb |unknown |.
242
+ \end {description }
243
+
244
+ \ottdefnsreduceXXexp
245
+
246
+ \ottdefnshelpers
247
+
189
248
190
249
\end {document }
0 commit comments