views:

95

answers:

4

Hi All,

I am programming C on cygwin windows. After having done a bit of C programming and getting comfortable with the language, I wanted to look under the hood and see what the compiler is doing for the code that I write.

So I wrote down a code block containing switch case statements and converted them into assembly using:

gcc -S foo.c  

Here is the C source:

switch(i)
{
    case 1:
    {
        printf("Case 1\n");
        break;
    }
    case 2:
    {           printf("Case 2\n");
        break;
    }
    case 3:
    {
        printf("Case 3\n");
        break;
    }
    case 4:
    {
        printf("Case 4\n");
        break;
    }
    case 5:
    {
        printf("Case 5\n");
        break;
    }
    case 6:
    {
        printf("Case 6\n");
        break;
    }
    case 7:
    {
        printf("Case 7\n");
        break;
    }
    case 8:
    {
        printf("Case 8\n");
        break;
    }
    case 9:
    {
        printf("Case 9\n");
        break;
    }
    case 10:
    {
        printf("Case 10\n");
        break;
    }
    default:
    {
        printf("Nothing\n");
        break;
    }
}  

Now the resultant assembly for the same is:

movl    $5, -4(%ebp)
cmpl    $10, -4(%ebp)
ja  L13
movl    -4(%ebp), %eax
sall    $2, %eax
movl    L14(%eax), %eax
jmp *%eax
.section .rdata,"dr"
.align 4
L14:
.long   L13
.long   L3
.long   L4
.long   L5
.long   L6
.long   L7
.long   L8
.long   L9
.long   L10
.long   L11
.long   L12
.text
L3:
movl    $LC0, (%esp)
call    _printf
jmp L2
L4:
movl    $LC1, (%esp)
call    _printf
jmp L2
L5:
movl    $LC2, (%esp)
call    _printf
jmp L2
L6:
movl    $LC3, (%esp)
call    _printf
jmp L2
L7:
movl    $LC4, (%esp)
call    _printf
jmp L2
L8:
movl    $LC5, (%esp)
call    _printf
jmp L2
L9:
movl    $LC6, (%esp)
call    _printf
jmp L2
L10:
movl    $LC7, (%esp)
call    _printf
jmp L2
L11:
movl    $LC8, (%esp)
call    _printf
jmp L2
L12:
movl    $LC9, (%esp)
call    _printf
jmp L2
L13:
movl    $LC10, (%esp)
call    _printf
L2:  

Now, in the assembly, the code is first checking the last case (i.e. case 10) first. This is very strange. And then it is copying 'i' into 'eax' and doing things that are beyond me.

I have heard that the compiler implements some jump table for switch..case. Is it what this code is doing? Or what is it doing and why? Because in case of less number of cases, the code is pretty similar to that generated for if...else ladder, but when number of cases increases, this unusual-looking implementation is seen.

Thanks in advance.

+6  A: 

First the code is comparing the i to 10 and jumping to the default case when the value is greater then 10 (cmpl $10, -4(%ebp) followed by ja L13).

The next bit of code is shifting the input to the left by two (sall $2, %eax) which is the same as multiple by four (it multiplies by 4 because each entry in the jump table is 4 bytes long). So what is had done here is generated an offset into the jump table.

It then loads an address from the jump table (movl L14(%eax), %eax) and jumps to it (jmp *%eax).

The jump table is simply a list of addresses (represented in assembly code by labels):

L14:
.long   L13
.long   L3
.long   L4
...

One thing to notice is that L13 represents the default case. It is both the first entry in the jump table (for when i is 0) and is handled specially at the beginning (when i > 10).

R Samuel Klatchko
I see... this is informative. But then why doesn't the compiler generate a jump table in case of fewer cases (like 2 or 3)?
puffadder
@puffadder: most modern compilers use heuristics to determine when it's more efficient to use branches versus a jump table. E.g. if your case levels were say 1, 100 and 1000 you might expect branches to be used.
Paul R
Poor compiler... I am not planning to right one in the near future :)
puffadder
+1  A: 

For [1..10] the compiler will generate a table so that it doesn't need to compare the value to go somewhere, it directly do a: goto table[i]. That way it's faster.

But in case i > 10 it jumps to your default statement. It has to check first before jumping otherwise, the program would miserably crash.

If you had sparse values (like, 23, 9233, 91238, and not 1, 2, 3...), the compiler would not generate such a table, and compare each value.

Nicolas Viennot
A: 

Yes, first eax is calculated by the switch value (sall shift as multiplication) to get the adress from the jump table (following label L14:)

jmp *%eax is a near jump to the label of your case. (jmp near eax)

The code following the other labels is just printing and skips the other cases.

stacker
+1  A: 

Yes it is a jump table. The first checking is to check if the value is in the cases and jump to default if it is not. Don't forget that in such a table, if %eax is 0, L14(%eax) points to the first element of the table (L13). So in the table the case 10: is indexed with 9, not 10.

The way the switch can be done depends on the values you have in case; in this case they are in "sequence", so the simple jump table is possible.

ShinTakezou