views:

624

answers:

4

I guess there might be some overlapping with previous SO questions, but I could not find a Delphi-specific question on this topic.

Suppose that you want to check if an unsigned 32-bit integer variable "MyAction" is equal to any of the constants ACTION1, ACTION2, ..., ACTIONn, where n is - say 1000. I guess that, besides being more elegant,

case MyAction of
  ACTION1: {code};
  ACTION2: {code};
  ...
  ACTIONn: {code};
end;

is much faster than

if MyAction = ACTION1 then
  // code
else if MyAction = ACTION2 then
  // code
...
else if MyAction = ACTIONn then
  // code;

I guess that the if variant takes time O(n) to complete (i.e. to find the right action) if the right action ACTIONi has a high value of i, whereas the case variant takes a lot less time (O(1)?).

  1. Am I correct that switch is much faster?
  2. Am I correct that the time required to find the right action in the switch case actually is independent of n? I.e. is it true that it does not really take any longer to check a million cases than to check 10 cases?
  3. How, exactly, does this work?
+8  A: 
  1. Yes the switch is O(1) whereas the cascading if's are O(n)
  2. Yes, see (1)
  3. With a branch table
Ozan
@Chris - O(n/2) *is* O(n) - normally the highest growth rate function of n is used, with constant factors omitted.
Barry Kelly
+3  A: 

Note that if the value of MyAction is weighted, good performance can be obtained with the cascading if..else, where you put the most likely cases near the top. I'm not saying that it will compete with the case/switch statement in terms of performance when you're dealing with integers. But if a case isn't suitable (suppose you've got strings, for example), put your high percentage tests up at the top.

Chris Thornton
Yes, naturally.
Andreas Rejbrand
+10  A: 

It is always good to check the reality first ...

Delphi 2010 compiler seems to like test-and-branch a lot. For example, the following simple code is not compiled into a branch table.

var
  c: (aaa, bbb, ccc);

begin
  case c of
    aaa: sleep(0);
    bbb: sleep(0);
    ccc: sleep(0);
  end;
end.

Compiler will generate the following code:

Project56.dpr.24: case c of
0040A1C4 0FB6053C0E4100   movzx eax,[$00410e3c]
0040A1CB 2C01             sub al,$01
0040A1CD 7208             jb $0040a1d7
0040A1CF 740F             jz $0040a1e0
0040A1D1 FEC8             dec al
0040A1D3 7414             jz $0040a1e9
0040A1D5 EB19             jmp $0040a1f0
Project56.dpr.25: aaa: sleep(0);
0040A1D7 6A00             push $00
0040A1D9 E86EDAFFFF       call Sleep
0040A1DE EB10             jmp $0040a1f0
Project56.dpr.26: bbb: sleep(0);
0040A1E0 6A00             push $00
0040A1E2 E865DAFFFF       call Sleep
0040A1E7 EB07             jmp $0040a1f0
Project56.dpr.27: ccc: sleep(0);
0040A1E9 6A00             push $00
0040A1EB E85CDAFFFF       call Sleep

Even more complicated cases will get compiled into a test-and-jump series. For example ...

var
  c: (aaa, bbb, ccc, eee, fff, ggg, hhh);

begin
  case c of
    aaa: sleep(0);
    bbb: sleep(0);
    ccc: sleep(0);
    hhh: sleep(0);
  end;
end.

... is compiled into ...

Project56.dpr.24: case c of
0040A1C4 0FB6053C0E4100   movzx eax,[$00410e3c]
0040A1CB 2C01             sub al,$01
0040A1CD 720C             jb $0040a1db
0040A1CF 7413             jz $0040a1e4
0040A1D1 FEC8             dec al
0040A1D3 7418             jz $0040a1ed
0040A1D5 2C04             sub al,$04
0040A1D7 741D             jz $0040a1f6
0040A1D9 EB22             jmp $0040a1fd
Project56.dpr.25: aaa: sleep(0);
0040A1DB 6A00             push $00
0040A1DD E86ADAFFFF       call Sleep
0040A1E2 EB19             jmp $0040a1fd
Project56.dpr.26: bbb: sleep(0);
0040A1E4 6A00             push $00
0040A1E6 E861DAFFFF       call Sleep
0040A1EB EB10             jmp $0040a1fd
Project56.dpr.27: ccc: sleep(0);
0040A1ED 6A00             push $00
0040A1EF E858DAFFFF       call Sleep
0040A1F4 EB07             jmp $0040a1fd
Project56.dpr.28: hhh: sleep(0);
0040A1F6 6A00             push $00
0040A1F8 E84FDAFFFF       call Sleep

The most probably cause cause for such code is that jump tables don't play very well with L1 caches and because of that test-and-jump version may be faster if there is not a large number of case labels present.

A "proof" for that reasoning is the following program which does get translated into a jump table.

var
  b: byte;

begin
  case b of
    0: sleep(0);
    1: sleep(0);
    2: sleep(0);
    3: sleep(0);
    4: sleep(0);
    5: sleep(0);
    6: sleep(0);
    7: sleep(0);
    8: sleep(0);
    9: sleep(0);
   10: sleep(0);
   11: sleep(0);
   12: sleep(0);
   13: sleep(0);
   14: sleep(0);
   15: sleep(0);
   16: sleep(0);
   17: sleep(0);
   18: sleep(0);
   19: sleep(0);
   20: sleep(0);
  end;
end.

Project56.dpr.12: case b of
0040A178 0FB6C0           movzx eax,al
0040A17B 83F814           cmp eax,$14
0040A17E 0F8728010000     jnbe $0040a2ac
0040A184 FF24858BA14000   jmp dword ptr [eax*4+$40a18b]
...
Project56.dpr.14: 1: sleep(0);
0040A1EB 6A00             push $00
0040A1ED E85ADAFFFF       call Sleep
0040A1F2 E9B5000000       jmp $0040a2ac
Project56.dpr.15: 2: sleep(0);
0040A1F7 6A00             push $00
0040A1F9 E84EDAFFFF       call Sleep
0040A1FE E9A9000000       jmp $0040a2ac
Project56.dpr.16: 3: sleep(0);
0040A203 6A00             push $00
0040A205 E842DAFFFF       call Sleep
0040A20A E99D000000       jmp $0040a2ac
...

Barry could give us a definite answer to that question. I'm just testing and rambling.

gabr
+10  A: 

The compiler will translate a case statement into one of:

  1. Two-level table, using one table to map the value to an index, and using the index to select an address from a jump table
  2. Indirect jump through table
  3. Sequential jumps
  4. Binary search - this is recursive, so leaves of the binary search may use any of 2, 3 or 4.

It uses heuristics such as the number of cases, the range of cases, the number of distinct alternatives (each alternative may implement a range of different values), etc.

The intuition for the case statement is that it is an O(1) operation.

Barry Kelly
I find it amazing that Delphi can do this and all its other optimizations and still compile so fast!
lkessler