views:

833

answers:

4

Hi overflowers

I'm currently developing an x86 disassembler, and I started disassembling a win32 PE file. Most of the disassembled code looks good, however there are some occurences of the illegal 0xff /7 opcode (/7 means reg=111, 0xff is the opcode group inc/dec/call/callf/jmp/jmpf/push/illegal with operand r/m 16/32). The first guess was, that /7 is the pop instruction, but it is encoded with 0x8f /0. I've checked this against the official Intel Architecture Software Developer’s Manual Volume 2: Instruction Set Reference - so I'm not just missleaded.

Example disassembly: (S0000O0040683a is a lable being jumped to by another instruction)

S0000O0040683a: inc    edi                      ; 0000:0040683a  ff c7
                test   dword ptr [eax+0xff],edi ; 0000:0040683c  85 78 ff
                0xff/7 edi                      ; 0000:0040683f  ff ff

BTW: gdb disassembles this equally (except the bug 0xff not yielding -1 in my disassembly):

(gdb) disassemble 0x0040683a 0x00406840
Dump of assembler code from 0x40683a to 0x406840:
0x0040683a:     inc    %edi
0x0040683c:     test   %edi,0xffffffff(%eax)
0x0040683f:     (bad)  
End of assembler dump.

So the question is: Is there any default handler in the illegal opcode exception handler of Windows, which implements any functionality in this illegal opcode, and if yes: What happends there?

Regards, Bodo

+2  A: 

Visual Studio disassembles this to the following:

00417000 FF C7            inc         edi  
00417002 85 78 FF         test        dword ptr [eax-1],edi 
00417005 ??               db          ffh  
00417006 FF 00            inc         dword ptr [eax]

Obviously, a general protection fault happens at 00417002 because eax does not point to anything meaningful, but even if I nop it out (90 90 90) it throws an illegal opcode exception at 00417005 (it does not get handled by the kernel). I'm pretty sure that this is some sort of data and not executable code.

DrJokepu
So Visual Studio conveniently sidesteps the problem. I wonder why it didn't write, "dw ffffh" and decide that 00 was the first byte of the next instruction (I believe it's some type of "add")
Nathan Fellman
@Nathan, if after a one byte, it is known that the instruction is illegal, why should it assume that it is a 2 byte instruction? I would say that visual studio is handling it quite sanely.
Evan Teran
Most other disassemblers do it the same way as Visual Studio. On one hand, after reading the first byte 0xff, it's not clear yet if it will be an invalid opcode. On the other hand: The CPU will raise an invalid opcode exception, so it doesn't matter if one assumes 1, 2 or 20 bytes ...
bothie
+1  A: 

To answer your question, Windows will close the application with the exception code 0xC000001D STATUS_ILLEGAL_INSTRUCTION. The dialog will match the dialog used for any other application crashes, whether it offers a debugger or to send an error report.

Regarding the provided code, it would appear to have either been assembled incorrectly (encoding a greater than 8-bit displacement) or is actually data (as suggested by others already).

Zooba
A: 

It looks like 0xFFFFFFFF has been inserted instead of 0xFF for the test instruction, probably in error?

85 = test r/m32, and 78 is the byte for parameters [eax+disp8], edi, with the disp8 to follow which should just be 0xFF (-1) but as a 32-bit signed integer this is 0xFFFFFFFF.

So I am assuming that you have 85 78 FF FF FF FF where it should be 85 B8 FF FF FF FF for a 32-bit displacement or 85 78 FF for the 8-bit displacement? If this is the case the next byte in the code should be 0xFF...

Of course, as suggested already, this could just be data, and don't forget that data can be stored in PE files and there is no strong guarantee of any particular structure. You can actually insert code or user defined data into some of the MZ or PE header fields if you are agressively optimising to decrease the .exe size.

EDIT: as per the comments below I'd also recommend using an executable where you already know exactly what the expected disassembled code should be.

jheriko
I'm writing the *dis*assembler, not the assembler ;)And the point with the data is most problably not true as well, because the mentioned opcode is reachable via JMPs, CALLs and/or JCCs from the program's main entry point.
bothie
bothie: What sort of program is this? Is it possible that the program modifies the offending instruction bytes before getting there? There is no way this set of opcodes will ever run unmodified on Windows or any other x86 OS.
DrJokepu
if other disassemblers fall over with this input then I wouldn't worry too much about it. I'd recommend using an executable where you already know exactly what the expected disassembled code is...
jheriko
+2  A: 

After many many additional hours getting my disassembler to produce the output in the exact same syntax than gdb does, I could diff over the two versions. This revealed a rather awkward bug in my disassember: I forgot to take into account, that the 0x0f 0x8x jump instruction have a TWO byte opcode (plus the rel16/32 operand). So each 0x0f 0x8x jump target was off by one leading to code which is not reachable in reality. After fixing this bug, no 0xff/7 opcodes are disassembled any longer.

Thanks go to everyone answering to my question (and commenting that answers as well) and thus at least trying to help me.

bothie