views:

492

answers:

9

Consider:

if (condition1)
{
    // Code block 1
}
else
{
    // Code block 2
}

If I know that condition1 will be true the majority of the time, then I should code the logic as written, instead of:

if (!condition1)
{
    // Code block 2
}
else
{
    // Code block 1
}

since I will avoid the penalty of the jump to the second code block (note: I have limited knowledge of assembly language). Does this idea carry forward to switch statements and case labels?

switch (myCaseValue)
{
    case Case1:
        // Code block 1
        break;

    case Case2:
        // Code block 2
        break;

    // etc.
}

If I know that one of the cases will happen more often, can I rearrange the order of the case labels so that it's more efficient? Should I? In my code I've been ordering the case labels alphabetically for code readability without really thinking about it. Is this micro-optimization?

+5  A: 

Case labels should be ordered in the most effecient way for readability.

Reordering case labels for efficiency is a case of premature optimization unless a profiler has specifically told you this is a problem.

JaredPar
not answering the question, this is like a basic copy-paste answer for all performance questions
Ramónster
I agree with Ramonster. This is a pat answer and should be a comment at most.
BipedalShark
@Ramónster, yes indeed it's a copy / paste answer because it's essentially the same question. It doesn't make the answer any less correct and making it a comment certainly doesn't change it's correctness. This is premature optimization.
JaredPar
He didn't specify context code, so you don't know that it was premature optimization. If he is worried about efficiency, assume it's in a hot spot, otherwise he shouldn't be worried.
Ramónster
@Ramónster, exactly. The lack of context indicates it is indeed premature optimization. Only a profile indicating a particular case is slow is enough context to diagnose this problem. Anything else is simply guessing.
JaredPar
You were guessing that his code was not in a hot spot. A good answer would've had an explanation about performance optimizations in reordering switch/case statements, and as a side note mention that it's a waste if it's not hot code.
Ramónster
@Ramónster, saying in anything is a hotspot without a profile is an guess and nothing more. I've spent more time in my life undoing bad optimizations on peoples's guesses than I've ever done implementing actual performance fixes. Optimizing anything without a profile is a guess, nothing more. You're much more likely to make things slower and less readable.
JaredPar
Point is, you didn't answer his whole question, he came here to learn about switch/case, and instead you just told him to forget about it. This site is to encourage learning.
Ramónster
@Ramónster, I believe it encourages the right solution, don't optimize until a profiler tells you too. It's certainly not an incorrect answer. Downvoting it is simply incorrect.
JaredPar
@Ramónster, you're right this site should encourage learning. There are many questions of this sort, and in nearly every one, the point that needs to be learned is that it just doesn't matter *unless* `Block 1` and `Block 2` are almost vacuous, which is seldom the case.
Mike Dunlavey
... Sometimes I feel like someone who builds race cars for a living, and people often ask me which kind of paint has the least aerodynamic drag. Usually, I'm pretty good about being patient.
Mike Dunlavey
@Mike, hah. Me I'm repairing an old ship which was built with what was thought to be superduper-aerodynamic paint but unfortunately turned out to weigh more than the ship.
JaredPar
@JaredPar or you work on an architecture that you know has a high cost for branches. In that case you don't need a profiler to tell you that it will be expensive. So generalising and saying 'this is premature optimisation' is a little too general IMO. For x86 maybe but in my world i.e PS3 then VERY different story
zebrabox
@zebrabox the high cost of a single branch is irrelevant if the code is say called very infrequently. Only a profiler can tell you if that branch is significant.
JaredPar
+6  A: 

It depends. The compiler will use a bunch of internal implementation-dependent criteria to decide whether to implement the switch as a sequence of if-like tests, or as a jump table. This might depend, for example, on how "compact" your set of case labels is. If your case label values form a "dense" set, the compiler is probably more likely to use a jump table, in which case the ordering of case labels won't matter. If it decides to go with what resembles a sequence of if-else tests, the order might matter.

Keep in mind though, that the body of switch is one large statement, with case labels providing multiple entry points into that statement. For that reason, the compilers ability (as well as yours) to rearrange the case "sub-blocks" within that statement might be limited.

AndreyT
A: 

If you put the cases that happen most often first, this will optimize the code slightly, and because of the way switch statments work the same is true. When the program goes into switch and finds a case that's true, it will execute it and hit break, which will exit out of the loop. Your thinking is correct.

However, I do think this optimization is pretty minimal, and if it slows your development time to do this, it's probably not worth it. Also if you have to modify your program flow drastically to accommodate this, it's probably not worth it. You're only saving a couple cycles at most and likely would never see the improvement.

Jeremy Morgan
There's no guarantee that lexical order will be taken into account. The compiler is free to reorder the cases in any way it sees fit. It will likely do a better job of doing so than the programmer can. Hence, it's best to write them in an order that's easiest to understand or best communicates intent. If order bias is important, use a cascading conditional statement (*if-else if-else*). Othewise, leave the optimization to the compiler.
seh
The compiler is only free to do that if there are unconditional `break` statements (or equivalent) at the end of each `case` "segment", and if the compiler is smart enopugh to recognize them (of course, modern compilers are). If there are no such `break` statements there, the segments are not rearrangeable, since the entire body of `switch` is one big compound statement with pre-determined internal ordering.
AndreyT
+7  A: 

Your conclusion regarding the if statements will not be true on most of the hardware I'm familiar with. The problem is not that you are jumping, but that you are branching. The code could go two different ways, depending on the result of a comparison. This can stall the pipeline on most modern CPUs. Branch prediction is common, and fixes the problem most of the time, but has nothing to do with your example. The predictor can equally well predict that a comparison will be false as it can that it will be true.

As usual, see wikipedia: Branch Predictor

PeterAllenWebb
Could you please show how this works in assembly?
Jon Seigel
I don't think an assembly listing would make it clear what's going on. You would have to look at a simulation of a particular CPU step by step over many loops over the if statement.
PeterAllenWebb
The compiler could easily invert the condition there. Trying to guess how an if statement is translated to code by the compiler is less than meaningless. This is doubly true if the compiler is optimizing.
caspin
+2  A: 

I'm not sure about the C# compiler, but I know that in assembly a switch statement can actually be programmed as a jump to a specific line, rather than evaluating the expression like an if statement. Since in a select you have all constants, it just treats cases as line numbers and you jump directly to the line number (case value) passed in without any evaluation. This makes the order of the case statements not really matter at all.

smoore
+28  A: 

Some facts for modern hardware like x86 or x86_64:

  • A unconditionally taken branch has almost no additional costs, besides the decoding. If you want a number, it's about a quarter clock cycle.
  • A conditional branch, which was correctly predicted, has almost no additional costs.
  • A conditional branch, which was not correctly predicted, has a penalty equal to the length of the processor pipelines, this is about 12-20 clocks, depending on the hardware.
  • The prediction mechanisms are very sophisticated. Loops with a low number of iterations (on Core 2 for example up to 64) can be perfectly predicted. Small repeating patterns like "taken-taken-nottaken-taken" can be predicted, if they are not too long (IIRC 6 on Core2).

You can read more about branch prediction in Agner Fogs excellent manual.

Switch statements are usually replaced by a jump table by the compiler. In most cases the order of cases won't make a difference at all. There are prediction mechanisms for indirect jumps as well.

So the question isn't if your jumps are more likely to be taken, it is if they are well predictable, at least for the hardware you intend to run your code on. This isn't an easy question at all. But if you have branches depending on a random (or pseudo random) condition, you could try to reformulate it as a branchless statement if possible.

drhirsch
This is the best answer so far. Except for microcontrollers, do not assume a 1:1 translation of your C code in pseudo-machine code. In modern processors there are a lot of parameters that are invisible to your C code that can have an impact on performance: cache-locality, TLB, register-spills, page-faults, branch predictions, associativity of cache... All these can have unexpected impacts on your optimisation efforts.
tristopia
You assume that the processor is not evaluating both sides of the branch simultaniously (until the expression has been evaluated). When I was back at Uni (20 years ago) they were already doing things along these lines. I have not kept up with the field but I assume that it will be more common now.
Martin York
AFAIK current desktop processors do not evaluate both sides simultaniosly, probably because this would require double prefetchers, double decoders and more execution units etc. This resources are probably more effectively used in a second logical core (aka hyperthreading). GPUs do always execute both sides of a branch - but for other reasons, and they don't do it simultaniously. But there may exist such a thing, but I was speaking mainly of x86 and the like. I stated that in my first verion, but edited it out.
drhirsch
What you are talking about is predication which exists in Itanium for example, where both sides of a condition are evaluated and the "right" one is kept, the "wrong" one discarded. This avoids pipeline stalls but at the cost of a lot of power (the processor does unecessary work). It is better to have a better branch predictor imho.
tristopia
Fine for x86 and good info :) but for e.g PowerPC then branch overhead or branch prediction miss can be very high
zebrabox
The idea of "branch prediction" is only applicable to jumps with hard-coded targets, which makes the situation with the `switch` statement more complicated. The jump-table approach is theoretically faster is some cases, but might suffer in practice from lack of branch prediction. It is not an easy task (if at all possible) to tell which will work better in each particular case.
AndreyT
Actually there is a working branch prediction in some processors for indirect jumps (jump tables). For example, indirect jumps up to 36 different targets can be predicted perfectly on a Core2. Please check http://www.agner.org/optimize/microarchitecture.pdf
drhirsch
+3  A: 

I think that even your initial premise - that you can optimize the if statement by rearranging the conditional may well be faulty. In a non-optimized build you might find doing what you're talking about has some value - maybe. In the general case you're going to have to jump at least once for either case, so there's no advantage (in general) to arranging the conditional anyway. But that's for non-optimized builds, so who cares about that optimization?

In optimized builds, I think you might be surprised by what a compiler sometimes generates for an if statement. The compiler may move one or the other (or both) cases to somewhere out-of-line. I think that you trying to optimize this naively by playing with which condition 'comes first' won't necessarily do what you want. At best you should do this only after examining what the compiler is generating. And, of course, this becomes an expensive process, since even the slightest change you make around the statement can change how the compiler decides to generate the output code.

Now, as far as the switch statement is concerned, I'd always go with using a switch when it makes the code more readable. The worst that a compiler should do with a switch statement that is equivalent to an if statement is to generate the same code. For more than a few cases, switch statements will generally be compiled as a jump table. But then again a set of if tests that are comparing a single variable to a set of values might very well be recognized by a compiler such that it'll do the same. However, I'd guess that using a switch will enable to compiler to recognize the situation much more readily.

If you're really interested in getting the most out of the performance of that conditional, you might consider using something like MSVC's Profile Guided Optimization (PGO or 'pogo')which uses the results of profiling runs to optimize how conditionals get generated. I don't know whether or not if GCC has similar capabilities.

Michael Burr
For gcc this is -fgenerate-profile and -fuse-profile.
drhirsch
this is such a better answer than all of the talk about processor architectures, please upvote.
caspin
@Caspin: one thing that the processor-specific discussion highlights is that it's really impossible to guess what the performance of a construct like this will be - even if you look at at the generated assembly. So much is happening even behind the assembly code that you really need to measure to know what's going on. Looking at this strictly from C level is maybe like trying to direct traffic at a city intersection from an airplane.
Michael Burr
+1  A: 

I assume you're aware that it will only matter if this is a hotspot. The best way to tell if it's a hotspot is to run the code, sample the program counter, and see if it's in there more than 10% of the time. If it is a hotspot, see how much time is spent in doing the if or switch. Usually it is negligible, unless your Block 1 and/or Block 2 do almost nothing. You can use a profiler for this. I just pause it repeatedly.

If you're not familiar with assembly language I would suggest learning it, enough to understand what the compiler generates. It's interesting and not hard.

Mike Dunlavey
A: 

As others have said, it depends on lots of things, including how many cases there are, how optimization is done, and the architecture you're running on. For an interesting overview, see http://ols.fedoraproject.org/GCC/Reprints-2008/sayle-reprint.pdf

Ken Rose