ESIL VM provides by default a set of helper operations for calculating flags. They fulfill their purpose by comparing the old and the new value of the dst operand of the last performed eq-operation. On every eq-operation (e.g. =) ESIL saves the old and new value of the dst operand. Note, that there also exist weak eq operations (e.g. :=), which do not affect flag operations. The == operation affects flag operations, despite not being an eq operation. Flag operations are prefixed with $ character.
z - zero flag, only set if the result of an operation is 0
b - borrow, this requires to specify from which bit (example: 4,$b - checks if borrow from bit 4)
c - carry, same like above (example: 7,$c - checks if carry from bit 7)
o - overflow
p - parity
r - regsize ( asm.bits/8 )
s - sign
ds - delay slot state
jt - jump target
js - jump target set
A target opcode is translated into a comma separated list of ESIL expressions.
xor eax, eax -> 0,eax,=,1,zf,=
Memory access is defined by brackets operation:
mov eax, [0x80480] -> 0x80480,[],eax,=
Default operand size is determined by size of operation destination.
movb $0, 0x80480 -> 0,0x80480,=[1]
The ? operator uses the value of its argument to decide whether to evaluate the expression in curly braces.
1. Is the value zero? -> Skip it.
2. Is the value non-zero? -> Evaluate it.
cmp eax, 123 -> 123,eax,==,$z,zf,=
jz eax -> zf,?{,eax,eip,=,}
If you want to run several expressions under a conditional, put them in curly braces:
zf,?{,eip,esp,=[],eax,eip,=,$r,esp,-=,}
Whitespaces, newlines and other chars are ignored. So the first thing when processing a ESIL program is to remove spaces:
esil = r_str_replace (esil, " ", "", R_TRUE);
Syscalls need special treatment. They are indicated by '$' at the beginning of an expression. You can pass an optional numeric value to specify a number of syscall. An ESIL emulator must handle syscalls. See (r_esil_syscall).
As discussed on IRC, the current implementation works like this:
a,b,- b - a
a,b,/= b /= a
This approach is more readable, but it is less stack-friendly.
NOPs are represented as empty strings. As it was said previously, interrupts are marked by '$' command. For example, '0x80,$'. It delegates emulation from the ESIL machine to a callback which implements interrupt handler for a specific OS/kernel/platform.
Traps are implemented with the TRAP command. They are used to throw exceptions for invalid instructions, division by zero, memory read error, or any other needed by specific architectures.
Here is a list of some quick checks to retrieve information from an ESIL string. Relevant information will be probably found in the first expression of the list.
indexOf('[') -> have memory references
indexOf("=[") -> write in memory
indexOf("pc,=") -> modifies program counter (branch, jump, call)
indexOf("sp,=") -> modifies the stack (what if we found sp+= or sp-=?)
indexOf("=") -> retrieve src and dst
indexOf(":") -> unknown esil, raw opcode ahead
indexOf("$") -> accesses internal esil vm flags ex: $z
indexOf("$") -> syscall ex: 1,$
indexOf("TRAP") -> can trap
indexOf('++') -> has iterator
indexOf('--') -> count to zero
indexOf("?{") -> conditional
equalsTo("") -> empty string, aka nop (wrong, if we append pc+=x)
Common operations:
• Check dstreg
• Check srcreg
• Get destinaion
• Is jump
• Is conditional
• Evaluate
• Is syscall
CPU flags are usually defined as single bit registers in the RReg profile. They are sometimes found under the 'flg' register type.
Properties of the VM variables:
1. They have no predefined bit width. This way it should be easy to extend them to 128, 256 and 512 bits later, e.g. for MMX, SSE, AVX, Neon SIMD.
2. There can be unbound number of variables. It is done for SSA-form compatibility.
3. Register names have no specific syntax. They are just strings.
4. Numbers can be specified in any base supported by RNum (dec, hex, oct, binary ...).
5. Each ESIL backend should have an associated RReg profile to describe the ESIL register specs.
What to do with them? What about bit arithmetics if use variables instead of registers?
1. ADD ("+")
2. MUL ("*")
3. SUB ("-")
4. DIV ("/")
5. MOD ("%")
1. AND "&"
2. OR "|"
3. XOR "^"
4. SHL "<<"
5. SHR ">>"
6. ROL "<<<"
7. ROR ">>>"
8. NEG "!"
At the moment of this writing, ESIL does not yet support FPU. But you can implement support for unsupported instructions using r2pipe. Eventually we will get proper support for multimedia and floating point.
ESIL specifies that the parsing control-flow commands must be uppercase. Bear in mind that some architectures have uppercase register names. The corresponding register profile should take care not to reuse any of the following:
3,SKIP - skip N instructions. used to make relative forward GOTOs
3,GOTO - goto instruction 3
LOOP - alias for 0,GOTO
BREAK - stop evaluating the expression
STACK - dump stack contents to screen
CLEAR - clear stack
rep cmpsb
cx,!,?{,BREAK,},esi,[1],edi,[1],==,?{,BREAK,},esi,++,edi,++,cx,--,0,GOTO
Those are expressed with the 'TODO' command. They act as a 'BREAK', but displays a warning message describing that an instruction is not implemented and will not be emulated. For example: