ansaurus

Question

Avoid range check on switch/case statement in gcc?

Answer 1

+2 A:

Have you tried declaring the switch variable as a bitfield?

struct Container {
  uint16_t a:3;
  uint16_t unused:13;
};

struct Container cont;

cont.a = 5;  /* assign some value */
switch( cont.a ) {
...
}

Hope this works!

Praetorian 2010-07-14 20:31:28

Interesting. But no luck, it still compares to 7 and "ja L5". I'm using gcc 4.2 BTW (Forgot to mention!) But the guy on the gcc bugzilla reported it happens in gcc 4.5 for him (though it might be non-x86.)

alecco 2010-07-14 20:45:53

Answer 2

+1 A:

Perhaps just use a default label for the fist or last case?

Jens Gustedt 2010-07-14 20:43:59

I forgot to mention I tried this, too. This just makes the conditional jump include the first or last cases, still branching. It seems like gcc can't tell the value is in the range 0-7.

alecco 2010-07-14 20:48:15

@aleccolocco: too bad. You didn't tell what version of gcc you have, maybe you just have a bad one? I suppose you also tried all optimizing flags?

Jens Gustedt 2010-07-14 21:02:00

@aleccolocco: sorry only read the description of your compiler after my last comment. I experimented on my machine (64 bit, gcc 4.4.3) I get a perfect jump table, and the only warning is the missing `return` for main. I'll also try on my phone an ARM with gcc 4.2

Jens Gustedt 2010-07-14 21:11:42

On ARM with gcc 4.2 it produces a jump table too, but isn't able to optimize out the bound check, which 4.4.3 does. The two instructions, `and` and `cmp` are really succeeding each other, so this is not very much optimized.BTW, I observed that the generated code is quite different when I add a `return` statement after the `switch`.

Jens Gustedt 2010-07-14 21:27:41

@Jens Gustedt: wow, you got busy on this one! My original code does have return and code after the switch.

alecco 2010-07-14 21:31:59

gcc 4.4.3 on i686: jump table, no bound check.gcc 4.0.1 on i686: jump table, bound check.So my guess is that the version makes the difference, but unfortunately I don't have my hands on a machine with 4.5 to do a test.

Jens Gustedt 2010-07-14 21:41:03

Yes, it's a GCC <4.4 problem. Thank you! About the original on adding default label to the first or last cases, it could be counter-productive because the branch would be taken sometimes instead of never. (Not tested.)

alecco 2010-07-15 16:06:25

Answer 3

+3 A:

Nathon 2010-07-14 20:47:24

Yeah, exactly the same code for -O3 and -O5, as stated on another comment I'm using gcc 4.2 but the bugzilla entry says it happens with gcc 4.5, too.

alecco 2010-07-14 20:50:17

Hmm, mine is gcc (Debian 4.4.4-6) 4.4.4.

Nathon 2010-07-14 20:53:29

There is no -O5 (http://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/Optimize-Options.html), so not really surprising that it does the same as -O3 ;-)

Peter 2010-07-14 21:10:17

@Nathon: Just tried it on Gentoo with 4.5 and had the same result as you! Argh. Still, this is for an open source project so can't demand users have the latest gcc version. Now I'm torn on who to assign the answer as correct :-/

alecco 2010-07-14 21:22:41

Answer 4

+4 A:

Perhaps you could use an array of function pointers instead of a switch ?

#include <stdio.h>

typedef void (*func)(void);

static void f0(void) { printf("%s\n", __FUNCTION__); }
static void f1(void) { printf("%s\n", __FUNCTION__); }
static void f2(void) { printf("%s\n", __FUNCTION__); }
static void f3(void) { printf("%s\n", __FUNCTION__); }
static void f4(void) { printf("%s\n", __FUNCTION__); }
static void f5(void) { printf("%s\n", __FUNCTION__); }
static void f6(void) { printf("%s\n", __FUNCTION__); }
static void f7(void) { printf("%s\n", __FUNCTION__); }

int main(void)
{
    const func f[8] = { f0, f1, f2, f3, f4, f5, f6, f7 };
    int i;

    for (i = 0; i < 8; ++i)
    {
        f[i]();
    }
    return 0;
}

Paul R 2010-07-14 21:05:28

alecco 2010-07-14 21:41:58

(Sorry to dead post…) Function call overhead is a given as long as you're doing a true function call. That should be eliminated if you turn it into a tail call by having fN return int.

Potatoswatter 2010-09-30 04:39:24

Answer 5

+1 A:

This question is certainly interesting from the standpoint of a missed compiler optimization that is seemingly obvious to us, and I did spend considerable time trying to come up with a straightforward solution, largely out of personal curiousity.

That said, I have to admit I am highly skeptical that this additional instruction will ever result in a measurable performance difference in practice, especially on a new mac. If you have any significant amount of data, you'll be I/O bound, and a single instruction will never be your bottleneck. If you have a tiny amount of data, then you'll need to perform a lot lot lot of calculations repeatedly before a single instruction will become a bottleneck.

Would you post some code to show that there really is a performance difference? Or describe the code and data your working with?

John 2010-07-15 15:02:09

Yes. If the branch is unlikely the predictor will mark it. But it will consume 2 cycles. I'm working on a high performance compression algorithm and implementation. One of the problems with compression is the unpredictability due to the nature of compression. Is the next block on decompression a literal or a match? (picking source) How long is it? Does it cross a cache-line or page boundary? (serious performance penalty.) This loop will be processed for every "atom" on decompression. Performance matters: Google has their super-secret compression algorithm, Zippy. (dbs, comms, everything)

alecco 2010-07-15 15:57:26

When originally posted I thought the bug/issue was broken in recent versions of gcc (because of the gcc bugzilla entry for 4.5 linked.) My bad. @Jens Gustedt above confirmed this. I tested then 4.4/4.5 on a different OS. I'll edit the question to reflect this on top. Thanks! (and have an upvote for valid critical thinking!)

alecco 2010-07-15 16:10:29

I agree that performance matters. I'm asking whether that cmpl instruction has a measurable performance effect, and if so, under what real circumstances (because I can't think of any.) It's the difference between this being a question of "how do I hack a specific version of gcc just for the sake of curiousity" and a question about a legitimate code optimization.

John 2010-07-15 21:24:29

Yeah. At the moment the test data is random chunks of Wikipedia text and still need to finish other parts of the code. But this is the innermost most important decoding loop, for just 1MB of typical plaintext it will be run ~180k times (from tests.)

alecco 2010-07-15 21:50:44

Answer 6

+1 A:

I didn't try, but I'm not sure that gcc_unreachable does the same thing as __builtin_unreachable. Googling the two, gcc_unreachable appears to be designed as a as an assertion tool for development of GCC itself, perhaps with a branch prediction hint included, whereas __builtin_unreachable makes the program instantly undefined — which sounds like deleting the basic block, which is what you want.

http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005funreachable-3075

Potatoswatter 2010-09-30 04:47:29

Yeah, here's proof that `gcc_unreachable` is a function and not the abstract representation of void code: http://old.nabble.com/optimization-of-switch-statements-on-i386-td15366926.html

Potatoswatter 2010-10-02 05:48:07

ansaurus

tags:

views:

answers:

Avoid range check on switch/case statement in gcc?

related questions