Skip to content

Commit ad80a6a

Browse files
committed
first release.
1 parent a3472ef commit ad80a6a

File tree

2 files changed

+111
-42
lines changed

2 files changed

+111
-42
lines changed

bil.ott

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ grammar
1717
stmt,s :: stmt_ ::=
1818
| var := exp :: :: move
1919
{{ com -- assign [[exp]] to [[var]] }}
20-
| jmp exp :: :: jump
21-
{{ com -- transfer control to given [[exp]] }}
20+
| jmp e :: :: jump
21+
{{ com -- transfer control to a given address [[e]] }}
2222
| cpuexn ( num ) :: :: cpuexn
2323
{{ com -- interrupt CPU with a given interrupt [[num]] }}
2424
| special ( string ) :: :: special
@@ -71,9 +71,9 @@ grammar
7171
| num : nat :: S :: word
7272
| 1 : nat :: S :: one
7373
| true :: S :: true
74-
{{ com -- sugar for $1:1$ }}
74+
{{ com -- sugar for 1:1 }}
7575
| false :: S :: false
76-
{{ com -- sugar for $0:1$ }}
76+
{{ com -- sugar for 0:1 }}
7777
| w1 .+ w2 :: S :: plus
7878
{{ tex [[w1]] \stackrel{bv} + [[w2]] }}
7979
{{ com -- plus }}
@@ -364,9 +364,10 @@ defns program :: '' ::=
364364
defn delta , w , var ~> delta' , w' , var' :: :: step :: '' by
365365

366366
delta,w,var |-> {addr=w1; size=w2; code=bil}
367-
delta,w1 |- bil ~> delta',w',{}
367+
delta, w1.+w2 |- bil ~> delta',w3,{}
368368
---------------------------------------------------------- :: step
369-
delta,w,var ~> delta',w'.+w2,var
369+
delta,w,var ~> delta',w3,var
370+
370371

371372
defn delta,w,var |-> insn :: :: decode :: '' by
372373

@@ -396,7 +397,6 @@ defns reduce_exp :: '' ::=
396397
delta[var' <- v'] |- var ~> v
397398

398399

399-
{{com if variable is not found, then it evaluates to unknown value}}
400400

401401
-------------------------------------------- :: var_unknown
402402
[] |- id:type ~> unknown [str] : type
@@ -405,12 +405,6 @@ defns reduce_exp :: '' ::=
405405
% LOAD %
406406
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
407407

408-
{{ com Load expression is evaluated semi-lazily, first we evaluate an
409-
address, then memory expression. Once memory expression is reduced to
410-
a value, we start to reduce a symbolic memory object, until we end up
411-
with a non-reducible value. The memory object is a sequence of nested
412-
store expressions. Whenever we hit a storage to an unknown address we
413-
stop reduction and reduce the whole load expression to an unknown val}}
414408

415409
delta |- e2 ~> v2
416410
------------------------------------- :: load_addr
@@ -423,7 +417,7 @@ defns reduce_exp :: '' ::=
423417

424418

425419
----------------------------------------------------------- :: load_byte
426-
delta |- (v1 with [w,ed]:8 <- num:8)[w,ed]:8 ~> num:8
420+
delta |- (v1 with [w,ed]:8 <- num:8)[w,ed']:8 ~> num:8
427421

428422
------------------------------------------------------------------------------ :: load_un_addr
429423
delta |- (v1 with [unknown[str]:t,ed]:8 <- v2)[v3,ed]:8 ~> unknown[str]:imm<8>
@@ -443,7 +437,7 @@ defns reduce_exp :: '' ::=
443437

444438
succ w = w'
445439
--------------------------------------------------- :: load_word_el
446-
delta |- v[w,el]:sz ~> v[w',el]:(sz-8) @ v[w,el]:8
440+
delta |- v[w,el]:sz ~> v[w',el]:(sz-8) @ v[w,be]:8
447441

448442

449443

@@ -468,6 +462,22 @@ defns reduce_exp :: '' ::=
468462
------------------------------------------------------------------------- :: store_mem
469463
delta |- e with [v1,ed]:sz <- val ~> v with [v1,ed] : sz <- val
470464

465+
succ w = w'
466+
delta |- high:8[w] ~> w1
467+
delta |- low:(sz-8)[w] ~> w2
468+
delta |- v with [w,be]:8 <- w1 ~> v'
469+
-------------------------------------------------------------------- :: store_word_be
470+
delta |- v with [w,be]:sz <- val ~> v' with [w', be]:(sz-8) <- w2
471+
472+
succ w = w'
473+
delta |- low:8[w] ~> w1
474+
delta |- high:(sz-8)[w] ~> w2
475+
delta |- v with [w,be]:8 <- w1 ~> v'
476+
-------------------------------------------------------------------- :: store_word_el
477+
delta |- v with [w,el]:sz <- val ~> v' with [w', el]:(sz-8) <- w2
478+
479+
480+
471481

472482
delta |- e1 ~> v
473483
------------------------------------------------ :: let_head
@@ -648,7 +658,7 @@ defns reduce_stmt :: '' ::=
648658

649659
delta |- e ~> w'
650660
---------------------------------- :: jmp
651-
delta,w |- jmp e ~> delta, w .+ w'
661+
delta,w |- jmp e ~> delta, w'
652662

653663

654664
------------------------------------ :: cpuexn

bil.tex

Lines changed: 85 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -22,24 +22,22 @@ \section{Introduction}
2222
This document describes the syntax and semantics of BAP Instruction
2323
Language. The language is used to represent a semantics of machine
2424
instructions. Each machine instruction is represented by a BIL program
25-
that tries to capture all side effect of the instruction.
26-
27-
25+
that captures side effect of the instruction.
2826

2927
\section{Syntax}
3028
\label{sec:syntax}
3129

3230
\subsection{Metavariables}
3331
\label{sec:meta}
3432

35-
We define a small set of metavariables that are used to denote string
36-
and numeric literals and subscripts:
33+
We define a small set of metavariables that are used to denote
34+
subscripts, numerals and string literals:
3735

3836
\ottmetavars
3937

4038
\subsection{BIL syntax}
4139

42-
BIL program is reperesented as a sequence of BIL statements. Each
40+
BIL program is reperesented as a sequence of statements. Each
4341
statement performs some side-effectful computation.
4442

4543
\ottgrammartabular{
@@ -57,11 +55,6 @@ \subsection{BIL syntax}
5755

5856
\ottgrammartabular{
5957
\ottexp\ottinterrule
60-
\ottvar\ottinterrule
61-
\ottbop\ottinterrule
62-
\ottuop\ottinterrule
63-
\ottendian\ottinterrule
64-
\ottcast\ottinterrule
6558
}
6659

6760
\ottgrammartabular{
@@ -92,10 +85,10 @@ \subsection{Bitvector syntax}
9285
value and size. Operations \verb|ext| and \verb|exts| performs
9386
extract/extend operation. The former is unsigned (i.e., it extends
9487
with zeros), the latter is signed. This operation extracts bits from a
95-
bitvector starting from $\mathit{hi}$ and ending $\mathit{lo}$ bit
96-
(both ends including). If $\mathit{hi}$ is greater than the bitwidth
97-
of the bitvector, then it is extended with zeros (for \verb|ext|
98-
operation) or with a sign bit (for \verb|exts|) operation.
88+
bitvector starting from $\mathit{hi}$ and ending with $\mathit{lo}$
89+
bit (both ends included). If $\mathit{hi}$ is greater than the
90+
bitwidth of the bitvector, then it is extended with zeros (for
91+
\verb|ext| operation) or with a sign bit (for \verb|exts|) operation.
9992

10093
\ottgrammartabular{
10194
\ottword\ottinterrule
@@ -108,8 +101,8 @@ \subsection{Value syntax}
108101
expressions that are not reducible.
109102

110103
We have three kinds of values --- immediates, represented as
111-
bitvectors; unknown values and storages (aka memories) represented
112-
symbolically as a list of assignments:
104+
bitvectors; unknown values and storages (memories in BIL parlance),
105+
represented symbolically as a list of assignments:
113106

114107
\ottgrammartabular{
115108
\ottval\ottinterrule
@@ -126,7 +119,7 @@ \subsection{Formula syntax}
126119
$\Delta$ context is represented as list of pairs. We also add a small
127120
set of operations over natural numbers, like comparison and
128121
arithmetics. Natural numbers are mostly used to reason about sizes of
129-
bitvectors, that why they are often referred as $\mathit{sz}$.
122+
bitvectors, that's why they are often referred as $\mathit{sz}$.
130123

131124
We also add syntax for equality comparison for values and variables.
132125

@@ -147,15 +140,16 @@ \subsection{Instruction syntax}
147140
\label{sec:insn}
148141

149142
To reason about the whole program we introduce a syntax for
150-
instruction. An instruction is a binary string with length
151-
$\mathit{w_2}$, that was read by a decoder from an address
152-
$\mathit{w_1}$. The semantics of an instruction is described by a
153-
$\mathit{bil}$ program.
143+
instruction. An instruction is a binary sequence of $\mathit{w_2}$
144+
bytes, that was read by a decoder from an address $\mathit{w_1}$. The
145+
semantics of an instruction is described by the $\mathit{bil}$ program.
154146

155147
\ottgrammartabular{
156148
\ottinsn\ottinterrule
157149
}
158150

151+
\clearpage
152+
159153
\section{Typing}
160154
\label{sec:typing}
161155

@@ -167,24 +161,89 @@ \section{Typing}
167161

168162
\ottdefnstypingXXexp
169163

164+
\clearpage
165+
170166

171167
\section{Operational semantics}
172168

173169
\subsection{Model of a program}
174170

175171
Program is coinductively defined as an infinite stream of program
176172
states, produced by a step rule. Each state is represented with a
177-
triplet $\Delta, w, var$, where $\Delta$ is a mapping from variables
173+
triplet $(\Delta, w, var)$, where $\Delta$ is a mapping from variables
178174
to values, $w$ is a program counter, and $var$ is a variable
179175
denoting currently active memory.
180176

181177
The \verb|step| rule defines how a machine instruction is
182-
evaluated. We use ``magic'' function \verb|decode| that fetches
183-
instruction from memory and decodes it to a BIL program.
178+
evaluated. We use ``magic'' rule \verb|decode| that fetches
179+
instructions from the memory and decodes them to a BIL program.
184180

185-
A program counter is updated after each instruction.
181+
The BIL code is evaluated using reduction rules of statements (see
182+
section \ref{sec:sema:stmt}). Then the program counter is updated with
183+
the $w_3$, that initially points to a byte following current instruction.
186184

187185
\ottdefnsprogram
188186

187+
\section{Semantics of statements}
188+
\label{sec:sema:stmt}
189+
190+
The reduction rule defines transformation of a state for each
191+
statement. The state of the reduction rule consists of a pair
192+
$(\Delta,w)$, where $\Delta$ is a mapping from variables to values and
193+
$w$ is an address of a next instruction.
194+
195+
Two statements affect the state: \verb|Move| statement introduces new
196+
$var \leftarrow v$ binding in $\Delta$, and \verb|Jmp| affects
197+
program counter.
198+
199+
The \verb|if| and \verb|while| instructions introduce local control
200+
flow.
201+
202+
There is no special semantics associated with \verb|special| and
203+
\verb|cpuexn| statements.
204+
205+
\ottdefnsreduceXXstmt
206+
207+
208+
\section {Semantics of expressions}
209+
\label{sec:sema:exp}
210+
211+
This section describes a small step operational semantics for
212+
expressions. A symbolic formula $\Delta \vdash e \rightarrow e' $
213+
defines a step of transformation from expression $e$ to an expression
214+
$e'$ under given context $\Delta$.
215+
216+
A well formed (well typed) expression evaluates to a value expression,
217+
that is syntactic subset of expression grammar (see
218+
section \ref{sec:values}).
219+
220+
A value can be either an immediate, represented by a bitvector, a
221+
unknown value, or a memory storage.
222+
223+
A memory storage is represented symbolically as a sequence of
224+
storages to the originally undefined memory. Each storage
225+
operation of size greater than 8 bits is desugared into a sequence of
226+
8 bit storages in a big endian order.
227+
228+
A load operation will first reduce all sub expressions of a memory
229+
object to values and then recursively destruct the object until one of
230+
the following conditions is met:
231+
232+
233+
\begin{description}
234+
\item[load-byte:] if the memory object is a storage of a \verb|value|
235+
to an immediate (known) address that we're trying to load then the
236+
load expression is reduced to \verb|value|.
237+
\item[load-un-memory:] if the memory object is an \verb|unknown| value,
238+
then the load expression evaluates to \verb|unknown|.
239+
\item[load-un-addr:] if the memory object is a storage to
240+
\verb|unknown| value address then load expression evaluates to
241+
\verb|unknown|.
242+
\end{description}
243+
244+
\ottdefnsreduceXXexp
245+
246+
\ottdefnshelpers
247+
189248

190249
\end{document}

0 commit comments

Comments
 (0)