[0x00 ~]$ radiff2 -g 0x40080d,0x40089f reverse4 reverse4 | xdot -
[0x00 ~]$ radiff2 -g 0x40089f,0x40080d reverse4 reverse4 | xdot -
A sad truth reveals itself after a quick glance at these graphs: radiff2 is a liar! In theory, grey boxes should be identical, yellow ones should differ only at some offsets, and red ones should differ seriously. Well this is obviously not the case here - e.g. the larger grey boxes are clearly not identical. This is something I'm definitely going to take a deeper look at after I've finished this writeup.
Anyways, after we get over the shock of being lied to, we can easily recognize that instr_S is basically a reverse-instr_A: where the latter does addition, the former does subtraction. To summarize this:
• arg1 == "M": subtracts arg2 from the byte at sym.current_memory_ptr.
• arg1 == "P": steps sym.current_memory_ptr backwards by arg2 bytes.
• arg1 == "C": subtracts arg2 from the value at sym.written_by_instr_C.
###instr_I
This one is simple, it just calls instr_A(arg1, 1). As you may have noticed the function call looks like call fcn.0040080d instead of call fcn.instr_A. This is because when you save and open a project, function names get lost - another thing to examine and patch in r2!
###instr_D
Again, simple: it calls instr_S(arg1, 1).
###instr_P
It's local var rename time again!
:> afvn local_0_1 const_M
:> afvn local_0_2 const_P
:> afvn local_3 arg1
This function is pretty straightforward also, but there is one oddity: const_M is never used. I don't know why it is there - maybe it is supposed to be some kind of distraction? Anyways, this function simply writes arg1 to sym.current_memory_ptr, and than calls instr_I("P"). This basically means that instr_P is used to write one byte, and put the pointer to the next byte. So far this would seem the ideal instruction to construct most of the "Such VM! MuCH reV3rse!" string, but remember, this is also the one that can be used only 9 times!
###instr_X
Another simple one, rename local vars anyways!
:> afvn local_1 arg1
This function XORs the value at sym.current_memory_ptr with arg1.
###instr_J
This one is not as simple as the previous ones, but it's not that complicated either. Since I'm obviously obsessed with variable renaming:
:> afvn local_3 arg1
:> afvn local_0_4 arg1_and_0x3f
After the result of arg1 & 0x3f is put into a local variable, arg1 & 0x40 is checked against 0. If it isn't zero, arg1_and_0x3f is negated:
The next branching: if arg1 >= 0, then the function returns arg1_and_0x3f,
else the function branches again, based on the value of sym.written_by_instr_C:
If it is zero, the function returns 2,
else it is checked if arg1_and_0x3f is a negative number,
and if it is, sym.good_if_ne_zero is incremented by 1:
After all this, the function returns with arg1_and_0x3f:
We've now reversed all the VM instructions, and have a full understanding about how it works. Here is the VM's instruction set:
| Instruction | 1st arg | 2nd arg | What does it do? |
|---|---|---|---|
| "A" | "M" | arg2 | *sym.current_memory_ptr += arg2 |
| "P" | arg2 | sym.current_memory_ptr += arg2 | |
| "C" | arg2 | sym.written_by_instr_C += arg2 | |
| "S" | "M" | arg2 | *sym.current_memory_ptr -= arg2 |
| "P" | arg2 | sym.current_memory_ptr -= arg2 | |
| "C" | arg2 | sym.written_by_instr_C -= arg2 | |
| "I" | arg1 | n/a | instr_A(arg1, 1) |
| "D" | arg1 | n/a | instr_S(arg1, 1) |
| "P" | arg1 | n/a | *sym.current_memory_ptr = arg1; instr_I("P") |
| "X" | arg1 | n/a | *sym.current_memory_ptr ^= arg1 |
| "J" | arg1 | n/a | arg1_and_0x3f = arg1 & 0x3f; if (arg1 & 0x40 != 0) arg1_and_0x3f *= -1 if (arg1 >= 0) return arg1_and_0x3f; else if (*sym.written_by_instr_C != 0) { if (arg1_and_0x3f < 0) ++*sym.good_if_ne_zero; return arg1_and_0x3f; } else return 2; |
| "C" | arg1 | n/a | *sym.written_by_instr_C = arg1 |
| "R" | arg1 | n/a | return(arg1) |
Well, we did the reverse engineering part, now we have to write a program for the VM with the instruction set described in the previous paragraph. Here is the program's functional specification:
• the program must return "*"
• sym.memory has to contain the string "Such VM! MuCH reV3rse!" after execution
• all 9 instructions have to be used at least once
• sym.good_if_ne_zero should not be zero
• instr_P is not allowed to be used more than 9 times
Since this document is about reversing, I'll leave the programming part to the fellow reader :) But I'm not going to leave you empty-handed, I'll give you one advice: Except for "J", all of the instructions are simple, easy to use, and it should not be a problem to construct the "Such VM! MuCH reV3rse!" using them. "J" however is a bit complicated compared to the others. One should realize that its sole purpose is to make sym.good_if_ne_zero bigger than zero, which is a requirement to access the flag. In order to increment sym.good_if_ne_zero, three conditions should be met:
• arg1 should be a negative number, otherwise we would return early
• sym.written_by_instr_C should not be 0 when "J" is called. This means that "C", "AC", or "SC" instructions should be used before calling "J".
• arg1_and_0x3f should be negative when checked. Since 0x3f's sign bit is zero, no matter what arg1 is, the result of arg1 & 0x3f will always be non-negative. But remember that "J" negates arg1_and_0x3f if arg1 & 0x40 is not zero. This basically means that arg1's 6th bit should be 1 (0x40 = 01000000b). Also, because arg1_and_0x3f can't be 0 either, at least one of arg1's 0th, 1st, 2nd, 3rd, 4th or 5th bits should be 1 (0x3f = 00111111b).