views:

380

answers:

10

Is there any way to optimize the following line of C code (to avoid branching)?

if ((i < -threshold) || (i > threshold)) 
{ 
    counter++; 
}

All variables are 16-bit signed integers. An optimized version should be highly portable.

+12  A: 

How about:

counter += (i < -threshold) | (i > threshold);

Assuming the original code was valid, then this should work too, in a portable way. The standard says that relational operators (<, > and so on) return an int equal to 1 on success, or 0 on failure.

UPDATE

To answer Sheen's comment below, the following code:

int main()
{
    short threshold = 10;
    short i = 20;
    short counter = 0;

    counter += (i < -threshold) | (i > threshold);

    return 0;
}

results in the following disassembler on x86 using GCC, with no optimisations:

  push   %rbp
  mov    %rsp,%rbp
  movw   $0xa,-6(%rbp)
  movw   $0x14,-4(%rbp)
  movw   $0x0,-2(%rbp)
  movswl -4(%rbp),%edx
  movswl -6(%rbp),%eax
  neg    %eax
  cmp    %eax,%edx
  setl   %dl
  movzwl -4(%rbp),%eax
  cmp    -6(%rbp),%ax
  setg   %al
  or     %edx,%eax
  movzbw %al,%dx
  movzwl -2(%rbp),%eax
  lea    (%rdx,%rax,1),%eax
  mov    %ax,-2(%rbp)
  mov    $0x0,%eax
  leaveq 
  retq  
Oli Charlesworth
not understand how this prevents branching. Can you paste generated assembly code here?
Sheen
This would add 2 to counter if threshold < 0, i > threshold, and i < -threshold. It might be safe to assume that threshold >= 0, but if so the OP should edit to add this assumption.
Philip Starhill
@Sheen On x86, the evaluations of conditions as integers can be done with instructions `setl` and `setg`, a little expensive because uncommon but still much cheaper than a mispredicted branch.
Pascal Cuoq
@Philip: Good spot. Thanks
Oli Charlesworth
@Oli Charlesworth, thanks a lot.
Sheen
It's poor programming practice in general, but if this were needed on CUDA, I'd use Oli's method in a heart beat.
Jason Iverson
Have you tested the difference between the bitwise or `|` and the logical or `||`?
Mark Ransom
Simple and consistent answer... thanks!
Necronet
+1  A: 

Compare the absolute of both

short imask = i >> sizeof(short) * 8 - 1; //compute the sign bit 1 or 0
short tmask = threshold >> sizeof(short) * 8 - 1; //compute the sign bit 1 or 0

short iabsolute = (i + imask) ^ imask; // compute i absolute
short tabsolute = (threshold + tmask) ^ tmask; // compute threshold absolute

counter += iabsolute > tabsolute;
Ahmed Said
Right-shifting a negative number is UB. The questioner asked for "portable".
Oli Charlesworth
Nice. C99 has `CHAR_BIT` in limits.h instead of `8` to make it work on unusual (but still 2's complement) architectures. Also, you mean to use "absolute>threshold", probably.
Pascal Cuoq
@Oli Charlesworth No, it is implementation-defined. 6.5.7.5.
Pascal Cuoq
@Pascal: Yes, you are correct. It's left-shifts that are undefined...
Oli Charlesworth
A: 

Depending on the distribution of values of 'i', your CPU may well cache the branch prediction for you better than any code change you might make. See http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/ for an interesting writeup. Reddit discussion here: http://www.reddit.com/r/programming/comments/c7ues/fast_and_slow_ifstatements_branch_prediction_in/

Nico
A: 

You can use the following trick which reduces the branches to a single branch:

if (((unsigned) (i + threshold)) > (threshold << 1)) 
{ 
  counter++; 
}

or, for the pedantic:

if (((unsigned) i + (unsigned) threshold) > ((unsigned) threshold << 1)) 
{ 
  counter++; 
}
Skizz
The addition (and the left-shift) could overflow. Also, there will should only be one branch in the original code (well, depends on the instruction set, I suppose).
Oli Charlesworth
@Oli: It can't possibly overflow if the original didn't overflow. If the left shift overflowed then the original test `(i < -threshold) || (i > threshold)` would not make sense. This works. I've used it a lot. It's a non-obvious tweak.
Skizz
@Skizz: I agree that this works in practice in two's-complement arithmetic. But technically, the behaviour on integer overflow is undefined. And this can happen in your code if e.g. threshold = `INT_MAX`.
Oli Charlesworth
@Oli: It depends on the size of int since << would promote the lhs value to an int. You could explicitly cast it you want to be really sure.
Skizz
@Skizz: If you were to cast every input to unsigned, then I would feel happier about this code snippet!
Oli Charlesworth
Skizz is correct, but only on a technicality. Assuming `int` is greater than 16 bits, since `i` and `threshold` were specified by OP to be 16-bit, the sum cannot overflow. But in general, one should cast to `unsigned` *before* the addition rather than after.
R..
@R: That's a good point. I hadn't picked up on the fact that Skizz is working in 32-bit (assuming x86) whereas the question was based on 16-bit. In that case, so long as we can guarantee that `sizeof(unsigned) > sizeof(i)`, then I agree that Skizz's approach works (apart from when `threshold < 0`).
Oli Charlesworth
A: 

This is based on bit twiddling hacks, (highly recommended)

#define CHAR_BIT 8

int main()
{
  int i=-3; // example input
  int treshold=2; // example treshold
  int count=0;
  // step 1: find the absolute value of i
  unsigned int r;  // the result goes here 
  int const mask = i >> (sizeof(int) * CHAR_BIT - 1);
  r = (i + mask) ^ mask;
  // step 2: compute the sign of the difference
  // sign becomes 0 (if r<=treshold)
  // sign becomes 1 otherwise
  int sign = 1 ^ ((unsigned int)(r-treshold-1) >> (sizeof(int) * CHAR_BIT - 1));
  count+=sign;
  return count;
}

This works for 32 bit integers, adapting to 16 bits should be easy. It compiles using g++.

The speed depends on the used processor. Branching might be faster after all.

DirkM
Right-shifting negative numbers is implementation-defined.
Oli Charlesworth
From the bit twiddling hacks website: On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. I've read that ANSI C does not require values to be represented as two's complement, so it may not work for that reason as well (on a diminishingly small number of old machines that still use one's complement). So it depends on how portable the OP wanted the question to be answered.
DirkM
@Oli, you're right that right-shifting negative numbers is implementation defined. If you find a compiler that does not implement this as replication of sigificant bits (e.g. what everyone expects) I'll send you a bootle of wine.. (no, compilers written by yourself don't apply)
Nils Pipenbrinck
@Nils: No, I can't think of one! But I know that some processors don't have an arithmetic shift instruction, so I could imagine that a compiler for such a platform may not bother with manual sign-extension (which must take at least a few extra cycles).
Oli Charlesworth
+1  A: 

Oli Charlesworth, I think, has the right idea. However, I suspect that it can be further optimized (at the expense of readability).

The threshold can be normalized to zero to remove a comparison.

That is, ...

counter += ((unsigned) (i + threshhold)  < (unsigned) (threshhold + threshhold));
Sparky
Either of these additions could overflow.
Oli Charlesworth
Oli is right but it's easily fixed. Cast to `unsigned` before the addition and then it's fine. Since the original values fit in `signed int`, it will work fine.
R..
@R: On systems using two's-complement arithmetic, casting a negative int to unsigned will add (UINT_MAX+1) to it, but I believe the standard explicitly allows for systems to use sign+magnitude format, in which case the cast would subtract the value from ((UINT_MAX+1)/2). Unfortunately, I don't know of any guaranteed-portable way to add a possibly-negative value to an unsigned value when the sum may lie between INT_MAX and UINT_MAX.
supercat
@supercat Beware, R.. seems really touchy on the subject of casting negative signed ints to unsigned. But he is right. It is defined by the standard: it adds UINT_MAX+1 until the number is in the correct range. It is the cast from unsigned to signed that is implementation-defined.
Pascal Cuoq
+3  A: 

There is a standard idiom for range-checking with a single comparison instruction. It goes like:

(unsigned)x - a <= (unsigned)b - a   /* a <= x <= b */
(unsigned)x - a < (unsigned)b - a    /* a <= x < b */

As a common example (this version if isdigit is guaranteed to be correct by the standard):

(unsigned)ch - '0' < 10

If your original type is larger than int (for instance long long) then you will need to use larger unsigned types (for instance unsigned long long). If a and b are constants or already have unsigned type, or if you know b-a will not overflow, you can omit the cast from b.

In order for this method to work, naturally you must have a<=b and the types/values must be such that the original expression (i.e. a <= x && x <= b or similar) behaves mathematically correctly. For instance if x were signed and b unsigned, x<=b could evaluate to false when x=-1 and b=UINT_MAX-1. As long as your original types are all signed or smaller than the unsigned type you cast to, this is not an issue.

As for how this "trick" works, it is purely determining, after reduction modulo UINT_MAX+1, whether x-a lies in the range 0 to b-a.

In your case, I think the following should work just fine:

(unsigned)i + threshold > 2U * threshold;

If threshold does not change between loop iterations, the compiler can probably keep both threshold and 2U*threshold in registers.

Speaking of optimizations, a good compiler should optimize your original range test to use unsigned arithmetic where it knows the constraints are met. I suspect many do so with a and b constant, but perhaps not with more complex expressions. Even if the compiler can optimize it, though, the (unsigned)x-a<b-a idiom is still extremely useful in macros where you want to ensure that x is evaluated exactly once.

R..
+2  A: 

Oh, too bad the question has already been answered. To paraphrase Oli's answer, the code

#include <stdint.h>
int main()
{
    int32_t threshold_square = 100;
    int16_t i = 20;
    int16_t counter = 0;

    counter += ( (int32_t) i * i > threshold_square);

    return 0;
}

yields the following x86 assembler using GCC without optimizations

pushq   %rbp
movq    %rsp, %rbp
movl    $100, -8(%rbp)
movw    $20, -2(%rbp)
movw    $0, -4(%rbp)
movswl  -2(%rbp),%edx
movswl  -2(%rbp),%eax
imull   %edx, %eax
cmpl    -8(%rbp), %eax
setg    %al
movzbl  %al, %edx
movzwl  -4(%rbp), %eax
leal    (%rdx,%rax), %eax
movw    %ax, -4(%rbp)
movl    $0, %eax
leave
ret

which is four instructions less than using (i < -threshold) | (i > threshold).

Whether this is better or not is, of course, depending on the architecture.

(The use of stdint.h is for illustrative purposes, for strict C89 replace with whatever is relevant for the target system.)

Christoffer
+1: I totally didn't think of this. Nice (and in hindsight, obvious) approach!
Oli Charlesworth
As much as this is correct and more optimal than Oli's method, one advantage of his method (and its variants appearing in other answers) is that it is easy extend it to check for asymmetric range, while here the range is always symmetric.
ysap
A: 

What is wrong with the original code? Does it really need hand-optimising?

Any decent compiler should be able to optimise that very well. Any hand-optimising would probably only lead to obfuscation.

xlq
+1  A: 

This code have no branch an highly portable (however, implementation of abs may have one).

#include <stdlib.h>
counter += abs(i) > threshold;

That's simplest standard compliant expression.

If your compiler does not using optimized macro for abs() you may use your own macro/ inline function.

That are examples, that use nature of twos complement format used on most machines:

#define ABS(x) ((x)*(((x)>>15)|1))

#define ABS(x) ((x)-((x)>>15)^((x)>>15))

Also you may replace comparison operator with expression like this:

#define LESS(x, y) (-((x)-(y))>>15))

Resulting code:

counter -= ((threshold - abs(i)) >> 15);

All those macros rely on fact, that shift right to number of bits minus one of positive value or zero evaluates to zero, and of negative evaluates to minus one. But thats implementation defined.

Vovanium