ansaurus

Question

Is it possible to tell the branch predictor how likely it is to follow the branch?

Answer 1

+12 A:

The __builtin_expect is a method that gcc (versions >= 2.96) offer for programmers to indicate branch prediction information to the compiler. The return value of __builtin_expect is the first argument (which could only be an integer) passed to it.

if (__builtin_expect (x, 0))
                foo ();

     [This] would indicate that we do not expect to call `foo', since we
     expect `x' to be zero.

Drakosha 2009-12-05 06:12:25

Perfect, thanks!

Andy Shulman 2009-12-05 06:26:20

In Microsoft environments, if statements are predicted to be always true. Some versions do have Profile Guided Optimisation.

Charles Beattie 2010-06-16 08:20:42

Answer 2

A:

No, because there's no assembly command to let the branch predictor know. Don't worry about it, the branch predictor is pretty smart.

Also, obligatory comment about premature optimization and how it's evil.

EDIT: Drakosha mentioned some macros for GCC. However, I believe this is a code optimization and actually has nothing to do with branch prediction.

rlbond 2009-12-05 06:12:35

Thank you Mr. Knuth. If this weren't a competition to see whose solution ran the absolute fastest, I would completely agree.

Andy Shulman 2009-12-05 06:14:40

If you need every single cycle, why not just use inline assembly?

rlbond 2009-12-05 06:16:11

The full quote: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. **A good programmer will not be lulled into complacency by such reasoning**, he will be wise to look carefully at the critical code; but only after that code has been identified." (emphasis mine)

Roger Pate 2009-12-05 06:19:19

Required to use C! I just love inane requirements. However, I would probably go insane. Writing a highly optimized malloc/free/realloc is difficult enough in C, but in assembly? Heh...

Andy Shulman 2009-12-05 06:19:37

The branch predictor has a static rule when it knows nothing about a branch: take backwards branches, don't take forwards branches. If you think about how a for loop works you'll understand why that makes sense, since you jump back to the top of the loop many more times than you don't. So what the GCC macro is controlling is how GCC lays out the opcodes in memory, so that the forward / backward branch prediction rule is most effective.

Don Neufeld 2009-12-05 07:17:54

This is plain wrong, there is actually an assembly command for letting the branch predictor know. It is ignored on all architectures but the Netburst, though.

drhirsch 2009-12-05 11:17:48

@don.neufeld: The static rule you mentioned exists only on older architectures, for Core2 and up the direction of a cold jump is pseudo-random, because it depends on an overwritten entry in the history table.

drhirsch 2009-12-05 11:19:35

Answer 3

A:

This sounds to me like overkill - this type of optimization will save tiny amounts of time. For example, using a more modern version of gcc will have a much greater influence on optimizations. Also, try enabling and disabling all the different optimization flags; they don't all improve performance.

Basically, it seems super unlikely this will make any significant difference compared to many other fruitful paths.

EDIT: thanks for the comments. I've made this community wiki, but left it in so others can see the comments.

Peter 2009-12-05 06:13:38

Stuck using this version of GCC, with flags -m32 -Wall -O2.

Andy Shulman 2009-12-05 06:18:22

No there can be valid use cases for this. For example there are compilers which output to c as immediate code and put a "if (break) break_into_debugger()" on each line to provide a platform independent debugging solution.

Lothar 2009-12-05 06:20:48

Actually on deeply pipelined processors branch prediction errors are extremely expensive, since they require a complete pipeline flush. 20x as expensive as an instruction execution is a reasonable estimate. If his benchmarks are telling him that he's got a problem with branch prediction then he's doing the right thing. VTune gives you very good data on this btw, if you haven't tried it.

Don Neufeld 2009-12-05 07:21:44

Answer 4

A:

SUN C Studio has some pragmas defined for this case.

#pragma rarely_called ()

This works if one part of a conditional expression is a function call or starts with a function call.

But there is no way to tag a generic if/while statement

Lothar 2009-12-05 06:18:10

Answer 5

+8 A:

Like Drakosha says, telling gcc which branch is the common case, so it generates better code for the case where the branch predictor is cold, and so the fast path through the function is easy for the CPU to execute, is probably very useful.

FYI, Pentium 4 had branch-predictor hints as prefixes to the jcc instructions, but only the netburst microarchitecture ever did anything with them. See http://ref.x86asm.net/geek32.html. And Section 3.5 of Agner Fog's excellent asm opt guide, from http://www.agner.org/optimize/. He has a guide to optimizing in C++, too.

Not much is officially published about exactly how the branch predictors and branch-target-buffers in the most recent Intel and AMD CPUs behave. The optimization manuals (easy to find on AMD's and Intel's web sites) give some advice, but don't document specific behaviour. Some people have run tests to try to divine the implementation, e.g. how many BTB entries Core2 has... Anyway, the idea of hinting the predictor explicitly has been abandoned (for now). What is documented is for example that Core2 has a branch history buffer that can avoid mispredicting the loop-exit if the loop always runs a constant short number of iterations, < 8 or 16 IIRC. But don't be too quick to unroll, because a loop that fits in 64bytes (or 19uops on Penryn) won't have instruction fetch bottlenecks because it replays from a buffer... go read Agner Fog's pdfs, they're excellent.

Peter Cordes 2009-12-05 07:21:09

BTW, you probably don't need to use builtin_expect if you use profile-guided optimization. PGO records which way each branch went, so when you compile with -fprofile-use, gcc knows which case is the common one for every branch. It still doesn't hurt to use builtin_expect to tell it the fast path, in case your code will be built without PGO, though.

Peter Cordes 2009-12-06 03:40:05

Answer 6

+6 A:

Yes, but it will have no effect. Exceptions are older (obsolete) architectures pre Netburst, and even then it doesn't do anything measurable.

There is an "branch hint" opcode Intel introduced with the Netburst architecture, and a default static branch prediction for cold jumps (backward predicted taken, forward predicted non taken) on some older architectures. GCC implements this with the __builtin_expect (x, prediction), where prediction is typically 0 or 1. This opcode is ignored on all newer architecures (>= Core 2). The small corner case where this actually does something is the case of a cold jump on the old Netburst architecture. Intel recommends now not to use the static branch hints, probably because they consider the increase of the code size more harmful than the possible marginal speed up.

There are multiple reasons it doesn't work as expected.

The processor can predict small loops (n<64) perfectly.
The processor can predict small repeating patterns (n~7) perfectly.
The processor itself can estimate the probability of a branch during runtime better than the compiler/programmer during compile time.
The predictability (= probability a branch will get predicted correctly) of a branch is far more important that the probability that the branch is taken. Unfortunatly this is highly architecture dependent, and the to predict the predictability of branch is notoriously hard.

Read more about the inner works of the branch prediction at Agner Fogs manuals. See also the gcc mailing list.

drhirsch 2009-12-05 11:09:44

Would be nice if you could quote / point to the exact portion where it says that the hint is ignored on newer architectures.

int3 2009-12-05 14:29:05

Chapter 3.12 "Static Prediction" in the link I gave.

drhirsch 2009-12-05 14:56:53

Answer 7

A:

I suggest rather than worry about branch prediction, profile the code and optimize the code to reduce the number of branches. One example is loop unrolling and another using boolean programming techniques rather than using if statements.

Most processors love to prefetch statements. Generally, a branch statement will generate a fault within the processor causing it to flush the prefetch queue. This is where the biggest penalty is. To reduce this penalty time, rewrite (and design) the code so that fewer branches are available. Also, some processors can conditionally execute instructions without having to branch.

I've optimized a program from 1 hour of execution time to 2 minutes by using loop unrolling and large I/O buffers. Branch prediction would not have offered much time savings in this instance.

Thomas Matthews 2009-12-05 20:34:08

ansaurus

tags:

views:

answers:

Is it possible to tell the branch predictor how likely it is to follow the branch?

related questions