tags:

views:

415

answers:

4

Background

I'm just learning x86 asm by examining the binary code generated by the compiler.

Code compiled using the C++ compiler in Visual Studio 2010 beta 2.

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.21003.01 for 80x86

C code (sandbox.c)

int mainCRTStartup()
{
    int x=5;int y=1024;
    while(x) { x--; y/=2; }
    return x+y;
}

Compile it using the Visual Studio Command Prompt

cl /c /O2 /Oy- /MD sandbox.c
link /NODEFAULTLIB /MANIFEST:NO /SUBSYSTEM:CONSOLE sandbox.obj

Disasm sandbox.exe in OllyDgb

The following starts from the entry point.

00401000 >/$ B9 05000000    MOV ECX,5
00401005  |. B8 00040000    MOV EAX,400
0040100A  |. 8D9B 00000000  LEA EBX,DWORD PTR DS:[EBX]
00401010  |> 99             /CDQ
00401011  |. 2BC2           |SUB EAX,EDX
00401013  |. D1F8           |SAR EAX,1
00401015  |. 49             |DEC ECX
00401016  |.^75 F8          \JNZ SHORT sandbox.00401010
00401018  \. C3             RETN

Examination

MOV ECX, 5          int x=5;
MOV EAX, 400        int y=1024;
LEA  ...            // no idea what LEA does here. seems like ebx=ebx. elaborate please.
                    // in fact, NOPing it does nothing to the original procedure and the values.

CQD                 // sign extends EAX into EDX:EAX, which here: edx = 0. no idea why.
SUB EAX, EDX        // eax=eax-edx, here: eax=eax-0. no idea, pretty redundant. 
SAR EAX,1           // okay, y/= 2
DEC ECX             // okay, x--, sets the zero flag when reaches 0.
JNZ ...             // okay, jump back to CQD if the zero flag is not set.

This part bothers me:

0040100A  |. 8D9B 00000000  LEA EBX,DWORD PTR DS:[EBX]
00401010  |> 99             /CDQ
00401011  |. 2BC2           |SUB EAX,EDX

You can nop it all and the values of EAX and ECX will remain the same at the end. So, what's the point of these instructions?

+3  A: 

The lea ebx,[ebx] is just a NOP operation. Its purpose is to align the beginning of the loop in memory, which will make it faster. As you can see here, the beginning of the loop starts at address 0x00401010, which is divisible by 16, thanks to this instruction.

The CDQ and SUB EAX,EDX operations make sure that the division will round a negative number towards zero - otherwise SAR would round it down, giving incorrect results for negative numbers.

interjay
wrong: EBX is used by __tmainCRTStartu to check if this is a managed app, so it's not a NOP and doesn't have anything todo with alignment
jn_
You're wrong yourself. `lea ebx,[ebx]` is equivalent to `mov ebx,ebx` - it doesn't do anything. The value of the ebx register isn't checked for anything.
interjay
How does it check that?
henle
It doesn't seem to do anything because EBX is initiliazed with zero (xor ebx, ebx) in __tmainCRTStartup before main is called. Just disassemble the code in __tmainCRTStartup and you will see. As soon as main returns ebx is checked. If you e.g. explicitly assign to EBX before returning in main, the compiler will preserve the value of EBX (zero for native app) and restore it when main returns.
jn_
jn: The fact that EBX holds a value is irrelevant, because this instruction does not compare it with anything or alter it. It could have accessed any other register. If you look at code created by Visual Studio, you will see this kind of code before most inner loops - and the beginning of the loop will be aligned to 16 (or maybe 8) bytes.
interjay
interjay: you are right when you're saying that this operation is used to align the loop. Inserting the NOPs at the beginning makes VS insert lea ecx, [ecx+0]. On the other hand, EBX is infact used by the runtime to check if this is a managed app. Shutdown of the runtime is handled differently if it is
jn_
+5  A: 

The whole thing

00401010  |> 99             /CDQ
00401011  |. 2BC2           |SUB EAX,EDX
00401013  |. D1F8           |SAR EAX,1

stands for the y /= 2. You see, a standalone SAR would not perform the signed integer division the way the compiler authors intended. C++98 standard recommends that signed integer division rounds the result towards 0, while SAR alone would round towards the negative infinity. (It is permissible to round towards negative infinity, the choice is left to the implementation). In order to implement rounding to 0 for negative operands, the above trick is used. If you use an unsigned type instead of a signed one, then the compiler will generate just a single shift instruction, since the issue with negative division will not take place.

The trick is pretty simple: for negative y sign extension will place a pattern of 11111...1 in EDX, which is actually -1 in 2's complement representation. The following SUB will effectively add 1 to EAX if the original y value was negative. If the original y was positive (or 0), the EDX will hold 0 after the sign extension and EAX will remain unchanged.

In other words, when you write y /= 2 with signed y, the compiler generates the code that does something more like the following

y = (y < 0 ? y + 1 : y) >> 1;

or, better

y = (y + (y < 0)) >> 1;

Note, that C++ standard does not require the result of the division to be rounded towards zero, so the compiler has the right to do just a single shift even for signed types. However, normally compilers follow the recommendation to round towards zero (or offer an option to control the behavior).

P.S. I don't know for sure what the purpose of that LEA instruction is. It is indeed a no-op. However, I suspect that this might be just a placeholder instruction inserted into the code for further patching. If I remember correctly, MS compiler has an option that forces the insertion of placeholder instructions at the beginning and at the end of each function. In the future this instruction can be overwritten by the patcher with a CALL or JMP instruction that will execute the patch code. This specific LEA was chosen just because it produces the a no-op placeholder instruction of the correct length. Of course, it could be something completely different.

AndreyT
Not `-2/2 + 1`, but rather `(-2 + 1) >> 1`. This evaluates to `-1`.
AndreyT
Yes! Just realised it.
henle
+1  A: 

This doesn't really answer the question, but is a helpful hint. Instead of mucking around with the OllyDbg.exe thing, you can make Visual Studio generate the asm file for you, which has the added bonus that it can put in the original source code as comments. This isn't a big deal for your current small project, but as your project grows, you may end up spending a fair amount of time figuring out which assembly code matches which source code.

From the command line, you want the /FAs and /Fa options (MSDN).

Here's part of the output for your example code (I compiled debug code, so the .asm is longer, but you can do the same thing for your optimized code):

_wmain  PROC      ; COMDAT

; 8    : {

    push ebp
    mov ebp, esp
    sub esp, 216    ; 000000d8H
    push ebx
    push esi
    push edi
    lea edi, DWORD PTR [ebp-216]
    mov ecx, 54     ; 00000036H
    mov eax, -858993460    ; ccccccccH
    rep stosd

; 9    :     int x=5; int y=1024;

    mov DWORD PTR _x$[ebp], 5
    mov DWORD PTR _y$[ebp], 1024  ; 00000400H
$LN2@wmain:

; 10   :     while(x) { x--; y/=2; }

    cmp DWORD PTR _x$[ebp], 0
    je SHORT $LN1@wmain
    mov eax, DWORD PTR _x$[ebp]
    sub eax, 1
    mov DWORD PTR _x$[ebp], eax
    mov eax, DWORD PTR _y$[ebp]
    cdq
    sub eax, edx
    sar eax, 1
    mov DWORD PTR _y$[ebp], eax
    jmp SHORT $LN2@wmain
$LN1@wmain:

; 11   :     return x+y;

    mov eax, DWORD PTR _x$[ebp]
    add eax, DWORD PTR _y$[ebp]

; 12   : }

    pop edi
    pop esi
    pop ebx
    mov esp, ebp
    pop ebp
    ret 0
_wmain  ENDP

Hope that helps!

gdunbar
Thanks, this is very useful. I use ollydbg because it is such a neat tool, it highlights the register values changed by the last instruction, draws where JMP instructs will go, for me it is much more convenient to trace the instructions in olly than in anything else.
Oh yeah, no question a full assembly debugger like OllyDbg is a great tool. I was just suggesting another tool for your toolbox.
gdunbar
+1  A: 

The reason that the compiler emits this:

LEA EBX,DWORD PTR DS:[EBX]

instead of the semantically equivalent:

NOP
NOP
NOP
NOP
NOP
NOP

..is that it's faster for the processor to execute one 6-byte instruction than six 1-byte instructions. That's all.

caf