tags:

views:

244

answers:

6

I am trying to understand how calculations involving numbers greater than 232 happen on a 32 bit machine.

C code

$ cat size.c
#include<stdio.h>
#include<math.h>

int main() {

    printf ("max unsigned long long = %llu\n",
    (unsigned long long)(pow(2, 64) - 1));
}
$

gcc output

$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$

Corresponding assembly code

$ gcc -S size.c -O3
$ cat size.s
    .file   "size.c"
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:
    .string "max unsigned long long = %llu\n"
    .text
    .p2align 4,,15
.globl main
    .type   main, @function
main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $16, %esp
    movl    $-1, 8(%esp)   #1
    movl    $-1, 12(%esp)  #2
    movl    $.LC0, 4(%esp) #3
    movl    $1, (%esp)     #4
    call    __printf_chk
    leave
    ret
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
    .section    .note.GNU-stack,"",@progbits
$

What exactly happens on the lines 1 - 4?

Is this some kind of string concatenation at the assembly level?

+1  A: 

The compiler actually made a static optimization of your code. lines #1 #2 #3 are parameters for printf()

Nicolas Viennot
@Pafy: Yes, I understand that. I want to know how exactly `printf` is prepared at these lines 1 - 4.
Lazer
`subl $16, %esp` makes some room for 4 arguments of 32bits. Line #1 and #2 moves the 64bits argument (so 2 32bits). Line #3 set the format string, and then "1" is pushed as an argument (which must be from the internal printf() call).
Nicolas Viennot
A: 

Doh! My mistake, this post is edited.
There's no overflow. Unsigned long long is 64 bit ( for 32bit machines, it's a GNU extension), although it's not a standard type. Unfortunately, there's no guarantee for that.

Kiril Kirov
@Kiril, I have updated my question with the gcc results. There is no overflow.
Lazer
It's totally standard with C99
Nicolas Viennot
O.o Yeah, it really is in C99.. I didn't know that, thanks!
Kiril Kirov
+1  A: 

As @Pafy mentions, the compiler has evaluated this as a constant.

2 to the 64th minus 1 is 0xffffffffffffffff.

As 2 32-bit integers this is: 0xffffffff and 0xffffffff,
which if you take that as a pair of 32-bit signed types, ends up as: -1, and -1.

Thus for your compiler the code generated happens to be equivalent to:

printf("max unsigned long long = %llu\n", -1, -1);

In the assembly it's written like this:

movl    $-1, 8(%esp)   #Second -1 parameter
movl    $-1, 12(%esp)  #First -1 parameter
movl    $.LC0, 4(%esp) #Format string
movl    $1, (%esp)     #A one.  Kind of odd, perhaps __printf_chk
                       #in your C library expects this.
call    __printf_chk

By the way, a better way to calculate powers of 2 is to shift 1 left. Eg. (1ULL << 64) - 1.

asveikau
shifting by 64 is undefined.
GregS
The shift is useless. `-1ULL` will give the same result.
R..
+1  A: 

In your case, the compiler knows that 2^64-1 is just 0xffffffffffffffff, so it has pushed -1 (low dword) and -1 (high dword) onto the stack as your argument to printf. It's just an optimization.

In general, 64-bit numbers (and even greater values) can be stored with multiple words, e.g. an unsigned long long uses two dwords. To add two 64-bit numbers, two additions are performed - one on the low 32 bits, and one on the high 32 bits, plus the carry:

; Add 64-bit number from esi onto edi:
mov     eax, [esi] ; get low 32 bits of source
add     [edi], eax ; add to low 32 bits of destination
; That add may have overflowed, and if it did, carry flag = 1.
mov     eax, [esi+4] ; get high 32 bits of source
adc     [edi+4], eax ; add to high 32 bits of destination, then add carry.

You can repeat this sequence of add and adcs as much as you like to add arbitrarily big numbers. The same thing can be done with subtraction - just use sub and sbb (subtract with borrow).

Multiplication and division are much more complicated, and the compiler usually produces some small helper functions to deal with these whenever you multiply 64-bit numbers together. Packages like GMP which support very, very large integers use SSE/SSE2 to speed things up. Take a look at this Wikipedia article for more information on multiplication algorithms.

wj32
+8  A: 

__printf_chk is a wrapper around printf which checks for stack overflow, and takes an additional first parameter, a flag (e.g. see here.)

pow(2, 64) - 1 has been optimised to 0xffffffffffffffff as the arguments are constants.

As per the usual calling conventions, the first argument to __printf_chk() (int flag) is a 32-bit value on the stack (at %esp at the time of the call instruction). The next argument, const char * format, is a 32-bit pointer (the next 32-bit word on the stack, i.e. at %esp+4). And the 64-bit quantity that is being printed occupies the next two 32-bit words (at %esp+8 and %esp+12):

pushl   %ebp                 ; prologue
movl    %esp, %ebp           ; prologue
andl    $-16, %esp           ; align stack pointer
subl    $16, %esp            ; reserve bytes for stack frame
movl    $-1, 8(%esp)   #1    ; store low half of 64-bit argument (a constant) to stack
movl    $-1, 12(%esp)  #2    ; store high half of 64-bit argument (a constant) to stack
movl    $.LC0, 4(%esp) #3    ; store address of format string to stack
movl    $1, (%esp)     #4    ; store "flag" argument to __printf_chk to stack
call    __printf_chk         ; call routine
leave                        ; epilogue
ret                          ; epilogue

The compiler has effectively rewritten this:

printf("max unsigned long long = %llu\n", (unsigned long long)(pow(2, 64) - 1));

...into this:

__printf_chk(1, "max unsigned long long = %llu\n", 0xffffffffffffffffULL);

...and, at runtime, the stack layout for the call looks like this (showing the stack as 32-bit words, with addresses increasing from the bottom of the diagram upwards):

        :                 :
        :     Stack       :
        :                 :
        +-----------------+
%esp+12 |      0xffffffff | \ 
        +-----------------+  } <-------------------------------------.
%esp+8  |      0xffffffff | /                                        |
        +-----------------+                                          |
%esp+4  |address of string| <---------------.                        |
        +-----------------+                 |                        |
%esp    |               1 | <--.            |                        |
        +-----------------+    |            |                        |
                  __printf_chk(1, "max unsigned long long = %llu\n", |
                                                    0xffffffffffffffffULL);
Matthew Slattery
awesome, thanks @Matthew.
Lazer
+1  A: 

similar to the way as we handle numbers greater than 9, with only digits 0 - 9. (using positional digits). presuming the question is a conceptual one.

alvin
great insight, never thought about it that way!
Lazer