Выбрать главу

First, lets analyze what we already have! First, rdi is put into local_3. Since the application is a 64bit Linux executable, we know that rdi is the first function argument (as you may have recognized, the automatic analysis of arguments and local variables was not entirely correct), and we also know that vmloop's first argument is the bytecode. So lets rename local_3:

:> afvn local_3 bytecode

Next, sym.memory is put into another local variable at rbp-8 that r2 did not recognize. So let's define it!

:> afv 8 memory qword

r2 tip: The afv [idx] [name] [type] command is used to define local variable at [frame pointer - idx] with the name [name] and type [type]. You can also remove local variables using the afv- [idx] command.

In the next block, the program checks one byte of bytecode, and if it is 0, the function returns with 1.

If that byte is not zero, the program subtracts 0x41 from it, and compares the result to 0x17. If it is above 0x17, we get the dreaded "Wrong!" message, and the function returns with 0. This basically means that valid bytecodes are ASCII characters in the range of "A" (0x41) through "X" (0x41 + 0x17). If the bytecode is valid, we arrive at the code piece that uses the jump table:

The jump table's base is at 0x400ec0, so lets define that memory area as a series of qwords:

[0x00400a74]> s 0x00400ec0

[0x00400ec0]> Cd 8 @@=`?s $$ $$+8*0x17 8`

r2 tip: Except for the ?s, all parts of this command should be familiar now, but lets recap it! Cd defines a memory area as data, and 8 is the size of that memory area. @@ is an iterator that make the preceding command run for every element that @@ holds. In this example it holds a series generated using the ?s command. ?s simply generates a series from the current seek ($$) to current seek + 80x17 ($$+80x17) with a step of 8.

This is how the disassembly looks like after we add this metadata:

[0x00400ec0]> pd 0x18

; DATA XREF from 0x00400a76 (unk)

0x00400ec0 .qword 0x0000000000400a80

0x00400ec8 .qword 0x0000000000400c04

0x00400ed0 .qword 0x0000000000400b6d

0x00400ed8 .qword 0x0000000000400b17

0x00400ee0 .qword 0x0000000000400c04

0x00400ee8 .qword 0x0000000000400c04

0x00400ef0 .qword 0x0000000000400c04

0x00400ef8 .qword 0x0000000000400c04

0x00400f00 .qword 0x0000000000400aec

0x00400f08 .qword 0x0000000000400bc1

0x00400f10 .qword 0x0000000000400c04

0x00400f18 .qword 0x0000000000400c04

0x00400f20 .qword 0x0000000000400c04

0x00400f28 .qword 0x0000000000400c04

0x00400f30 .qword 0x0000000000400c04

0x00400f38 .qword 0x0000000000400b42

0x00400f40 .qword 0x0000000000400c04

0x00400f48 .qword 0x0000000000400be5

0x00400f50 .qword 0x0000000000400ab6

0x00400f58 .qword 0x0000000000400c04

0x00400f60 .qword 0x0000000000400c04

0x00400f68 .qword 0x0000000000400c04

0x00400f70 .qword 0x0000000000400c04

0x00400f78 .qword 0x0000000000400b99

As we can see, the address 0x400c04 is used a lot, and besides that there are 9 different addresses. Lets see that 0x400c04 first!

We get the message "Wrong!", and the function just returns 0. This means that those are not valid instructions (they are valid bytecode though, they can be e.g. parameters!) We should flag 0x400c04 accordingly:

[0x00400ec0]> f not_instr @ 0x0000000000400c04

As for the other offsets, they all seem to be doing something meaningful, so we can assume they belong to valid instructions. I'm going to flag them using the instructions' ASCII values:

[0x00400ec0]> f instr_A @ 0x0000000000400a80

[0x00400ec0]> f instr_C @ 0x0000000000400b6d

[0x00400ec0]> f instr_D @ 0x0000000000400b17

[0x00400ec0]> f instr_I @ 0x0000000000400aec

[0x00400ec0]> f instr_J @ 0x0000000000400bc1

[0x00400ec0]> f instr_P @ 0x0000000000400b42

[0x00400ec0]> f instr_R @ 0x0000000000400be5

[0x00400ec0]> f instr_S @ 0x0000000000400ab6

[0x00400ec0]> f instr_X @ 0x0000000000400b99

Ok, so these offsets were not on the graph, so it is time to define basic blocks for them!

r2 tip: You can define basic blocks using the afb+ command. You have to supply what function the block belongs to, where does it start, and what is its size. If the block ends in a jump, you have to specify where does it jump too. If the jump is a conditional jump, the false branch's destination address should be specified too.

We can get the start and end addresses of these basic blocks from the full disasm of vmloop.

As I've mentioned previously, the function itself is pretty short, and easy to read, especially with our annotations. But a promise is a promise, so here is how we can create the missing bacic blocks for the instructions:

[0x00400ec0]> afb+ 0x00400a45 0x00400a80 0x00400ab6-0x00400a80 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400ab6 0x00400aec-0x00400ab6 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400aec 0x00400b17-0x00400aec 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400b17 0x00400b42-0x00400b17 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400b42 0x00400b6d-0x00400b42 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400b6d 0x00400b99-0x00400b6d 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400b99 0x00400bc1-0x00400b99 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400bc1 0x00400be5-0x00400bc1 0x400c15

[0x00400ec0]> afb+ 0x00400a45 0x00400be5 0x00400c04-0x00400be5 0x400c15

It is also apparent from the disassembly that besides the instructions there are three more basic blocks. Lets create them too!

[0x00400ec0]> afb+ 0x00400a45 0x00400c15 0x00400c2d-0x00400c15 0x400c3c 0x00400c2d

[0x00400ec0]> afb+ 0x00400a45 0x00400c2d 0x00400c3c-0x00400c2d 0x400c4d 0x00400c3c

[0x00400ec0]> afb+ 0x00400a45 0x00400c3c 0x00400c4d-0x00400c3c 0x400c61

Note that the basic blocks starting at 0x00400c15 and 0x00400c2d ending in a conditional jump, so we had to set the false branch's destination too!

And here is the graph in its full glory after a bit of manual restructuring:

I think it worth it, don't you? :) (Well, the restructuring did not really worth it, because it is apparently not stored when you save the project.)

r2 tip: You can move the selected node around in graph view using the HJKL keys.

By the way, here is how IDA's graph of this same function looks like for comparison:

As we browse through the disassembly of the instr_LETTER basic blocks, we should realize a few things. The first: all of the instructions starts with a sequence like these: