views:

755

answers:

5

Hi. I am trying to understand some things about jump tables and its relationship between a switch case statement.

I was told that a jump table is a O(1) structure that the compiler generates which makes lookup of values essentially about as fast as you can get. However in some cases a Hashtable/Dictionary might be faster. I was also told this will only work if the switch case contains ordered data values.

Can someone please confirm or deny this and explain what a jump table is, it's importance and the time complexity versus using a dictionary or hashtable. Thanks.

+2  A: 

Compiling for a switch statement can take many forms, depending on the cases. If the cases are close together, it is a no brainer: use a jump table. If the cases are far apart, use if (case == value) or use a map. Or a compiler can use a combination: islands of jump tables determined by if checks of the jump table ranges.

Richard Pennington
Speaking of hash tables, the compiler could definitely use perfect hashing rather than if checks + islands.
wrang-wrang
The only answer that doesn't get sidetracked into implementing its own jump table and stays on the key point: switch statements *act* like jump tables, *including* fall-through, but may have many different implementations, depending on many factors.
Roger Pate
@Roger: I have to disagree. He specifically asked: "Can someone please ... explain what a jump table is, it's importance and the time complexity versus using a dictionary or hashtable." This answer does handwaving instead of answering the question (at all).
Jerry Coffin
You're right that it doesn't answer the second (and less important to the OP, the way I interpret it) part of the question, but it still doesn't get sidetracked. Let's see if I can do better.
Roger Pate
@Roger: The first part was to confirm or deny "this" (apparently that a hash table might be faster in some cases), but this answer doesn't seem to attempt to address that either...
Jerry Coffin
Jerry: The OP really wants "to understand some things about jump tables and its relationship between a switch case statement." (Again, apparently you interpret him differently.) The fact that the OP gets sidetracked doesn't mean a good answer must also.
Roger Pate
+2  A: 

Suppose you had an array of procedures:

void fa() { 
 printf("a\n");
}

...

void fz() { 
 printf("it's z!\n");
}



typedef void (*F)();
F table[26]={fa,fb,...,fz};

Suppose you accept a character (from a-z) of input from the user and run fc:

char c;
switch(c) {
   case 'a': fa();break;
   case 'b': fb();break;
   ...
   case 'z': fz();break;       
   default: exit(-1);
}

Ideally this would be replaced with something like:

if (c<'a' || c>'z') exit(-1);
else (*table[c-'a'])();

Naturally, you might make the table bigger so the range check wouldn't be necessary.

The compiler would do this for arbitrary code, not necessarily function calls only, and would do it by storing the address to jump to (essentially, a goto). C doesn't directly support any sort of computed goto (indexing into a table or otherwise), but the CPU instructions for it are pretty simple.

wrang-wrang
+1  A: 

A jump table is basically an array of pointers to pieces of code to handle the various cases in the switch statement. It's most likely to be generated when your cases are dense (i.e. you have a case for every possible value in a range). For example, given a statement like:

switch (i) {
   case 1: printf("case 1"); break;
   case 2: printf("case 2"); break;
   case 3: printf("case 3"); break;
}

it could generate code roughly equivalent to something like this:

void case1() { printf("case 1"); }
void case2() { printf("case 2"); }
void case3() { printf("case 3"); }

typedef void (*pfunc)(void);

pfunc functions[3] = {case1, case2, case3};

if ((unsigned)i<3)    
    functions[i]();

This has O(K) complexity. A typical hash table also has roughly O(K) expected complexity, though the worst case is typically O(N). The jump table will usually be faster, but it will usually only be used if the table will be quite dense, whereas a hash table/dictionary works quite well even when the cases would be quite sparse.

Jerry Coffin
O(K) is usually written O(1). Remind me not to answer such basic questions; we have 3 essentially identical answers ;)
wrang-wrang
A: 

A jump table is simple an array of function pointers, you can picture a jump table roughly like so:

int (*functions[10])(); /* Array of 10 Function Pointers */

From my understanding, this is used with a case statement like so: each condition, case _, will be an index into this array, so for example:

switch( a ) {
    case 1:  // (*functions[1])() // Call function containing actions in case of 1
        ...  
    case 2:  // (*functions[2])() // Call function containing actions in case of 2
        ...

Each case, transforms to become simply functions[a]. This means that accessing functions[9] is just as quick as accessing functions[1]. Giving you the O(1) time you mentioned.

Obviously, if you have case 1, and case 4907, this isn't going to be a good method, and the hash table/dictionary methods you mentioned may come into play.

Dave
Not exactly; case fall-through and arbitrary code using locals, in the case statement, still work properly with a jump table. The function pointers are just a pedagogic vehicle.
wrang-wrang
+2  A: 

A jump table is an abstract structure used to transfer control to another location. Goto, continue, and break are similar, except they always transfer to a specific location instead of one possibility from many. In particular, this control flow is not the same as a function call. (Wikipedia's article on branch tables is related.)

A switch statement is how to write jump tables in C/C++. Only a limited form is provided (can only switch on integral types) to make implementations easier and faster in this common case. (How to implement jump tables efficiently has been studied much more for integral types than for the general case.) A classic example is Duff's Device.

However, the full capability of a jump table is often not required, such as when every case would have a break statement. These "limited jump tables" are a different pattern, which is only taking advantage of a jump table's well-studied efficiency, and are common when each "action" is independent of the others.


Actual implementations of jump tables take different forms, mostly differing in how the key to index mapping is done. That mapping is where terms like "dictionary" and "hash table" come in, and those techniques can be used independently of a jump table. Saying that some code "uses a jump table" doesn't imply by itself that you have O(1) lookup.

The compiler is free to choose the lookup method for each switch statement, and there is no guarantee you'll get one particular implementation; however, compiler options such as optimize-for-speed and optimize-for-size should be taken into account.

You should look into studying data structures to get a handle on the different complexity requirements imposed by them. Briefly, if by "dictionary" you mean a balanced binary tree, then it is O(log n); and a hash table depends on its hash function and collision strategy. In the particular case of switch statements, since the compiler has full information, it can generate a perfect hash function which means O(1) lookup. However, don't get lost by just looking at overall algorithmic complexity: it hides important factors.

Roger Pate