ansaurus

Question

How to optimize this line of C code (checking range)?

Answer 1

+12 A:

How about:

counter += (i < -threshold) | (i > threshold);

Assuming the original code was valid, then this should work too, in a portable way. The standard says that relational operators (<, > and so on) return an int equal to 1 on success, or 0 on failure.

UPDATE

To answer Sheen's comment below, the following code:

int main()
{
    short threshold = 10;
    short i = 20;
    short counter = 0;

    counter += (i < -threshold) | (i > threshold);

    return 0;
}

results in the following disassembler on x86 using GCC, with no optimisations:

  push   %rbp
  mov    %rsp,%rbp
  movw   $0xa,-6(%rbp)
  movw   $0x14,-4(%rbp)
  movw   $0x0,-2(%rbp)
  movswl -4(%rbp),%edx
  movswl -6(%rbp),%eax
  neg    %eax
  cmp    %eax,%edx
  setl   %dl
  movzwl -4(%rbp),%eax
  cmp    -6(%rbp),%ax
  setg   %al
  or     %edx,%eax
  movzbw %al,%dx
  movzwl -2(%rbp),%eax
  lea    (%rdx,%rax,1),%eax
  mov    %ax,-2(%rbp)
  mov    $0x0,%eax
  leaveq 
  retq

Oli Charlesworth 2010-10-27 14:54:47

not understand how this prevents branching. Can you paste generated assembly code here?

Sheen 2010-10-27 14:57:48

This would add 2 to counter if threshold < 0, i > threshold, and i < -threshold. It might be safe to assume that threshold >= 0, but if so the OP should edit to add this assumption.

Philip Starhill 2010-10-27 15:00:08

@Sheen On x86, the evaluations of conditions as integers can be done with instructions `setl` and `setg`, a little expensive because uncommon but still much cheaper than a mispredicted branch.

Pascal Cuoq 2010-10-27 15:02:38

@Philip: Good spot. Thanks

Oli Charlesworth 2010-10-27 15:05:39

@Oli Charlesworth, thanks a lot.

Sheen 2010-10-27 15:17:54

It's poor programming practice in general, but if this were needed on CUDA, I'd use Oli's method in a heart beat.

Jason Iverson 2010-10-27 16:15:00

Have you tested the difference between the bitwise or `|` and the logical or `||`?

Mark Ransom 2010-10-27 21:17:21

Simple and consistent answer... thanks!

Necronet 2010-10-28 00:14:44

Answer 2

+1 A:

Compare the absolute of both

short imask = i >> sizeof(short) * 8 - 1; //compute the sign bit 1 or 0
short tmask = threshold >> sizeof(short) * 8 - 1; //compute the sign bit 1 or 0

short iabsolute = (i + imask) ^ imask; // compute i absolute
short tabsolute = (threshold + tmask) ^ tmask; // compute threshold absolute

counter += iabsolute > tabsolute;

Ahmed Said 2010-10-27 15:01:08

Right-shifting a negative number is UB. The questioner asked for "portable".

Oli Charlesworth 2010-10-27 15:09:15

Nice. C99 has `CHAR_BIT` in limits.h instead of `8` to make it work on unusual (but still 2's complement) architectures. Also, you mean to use "absolute>threshold", probably.

Pascal Cuoq 2010-10-27 15:10:12

@Oli Charlesworth No, it is implementation-defined. 6.5.7.5.

Pascal Cuoq 2010-10-27 15:11:45

@Pascal: Yes, you are correct. It's left-shifts that are undefined...

Oli Charlesworth 2010-10-27 15:12:38

Answer 3

A:

Depending on the distribution of values of 'i', your CPU may well cache the branch prediction for you better than any code change you might make. See http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/ for an interesting writeup. Reddit discussion here: http://www.reddit.com/r/programming/comments/c7ues/fast_and_slow_ifstatements_branch_prediction_in/

Nico 2010-10-27 15:12:19

Answer 4

A:

You can use the following trick which reduces the branches to a single branch:

if (((unsigned) (i + threshold)) > (threshold << 1)) 
{ 
  counter++; 
}

or, for the pedantic:

if (((unsigned) i + (unsigned) threshold) > ((unsigned) threshold << 1)) 
{ 
  counter++; 
}

Skizz 2010-10-27 15:15:02

The addition (and the left-shift) could overflow. Also, there will should only be one branch in the original code (well, depends on the instruction set, I suppose).

Oli Charlesworth 2010-10-27 15:18:00

@Oli: It can't possibly overflow if the original didn't overflow. If the left shift overflowed then the original test `(i < -threshold) || (i > threshold)` would not make sense. This works. I've used it a lot. It's a non-obvious tweak.

Skizz 2010-10-27 15:51:32

@Skizz: I agree that this works in practice in two's-complement arithmetic. But technically, the behaviour on integer overflow is undefined. And this can happen in your code if e.g. threshold = `INT_MAX`.

Oli Charlesworth 2010-10-27 16:35:26

@Oli: It depends on the size of int since << would promote the lhs value to an int. You could explicitly cast it you want to be really sure.

Skizz 2010-10-27 19:27:13

@Skizz: If you were to cast every input to unsigned, then I would feel happier about this code snippet!

Oli Charlesworth 2010-10-27 19:43:29

Skizz is correct, but only on a technicality. Assuming `int` is greater than 16 bits, since `i` and `threshold` were specified by OP to be 16-bit, the sum cannot overflow. But in general, one should cast to `unsigned` *before* the addition rather than after.

R.. 2010-10-27 21:10:03

@R: That's a good point. I hadn't picked up on the fact that Skizz is working in 32-bit (assuming x86) whereas the question was based on 16-bit. In that case, so long as we can guarantee that `sizeof(unsigned) > sizeof(i)`, then I agree that Skizz's approach works (apart from when `threshold < 0`).

Oli Charlesworth 2010-10-28 09:20:56

Answer 5

A:

This is based on bit twiddling hacks, (highly recommended)

#define CHAR_BIT 8

int main()
{
  int i=-3; // example input
  int treshold=2; // example treshold
  int count=0;
  // step 1: find the absolute value of i
  unsigned int r;  // the result goes here 
  int const mask = i >> (sizeof(int) * CHAR_BIT - 1);
  r = (i + mask) ^ mask;
  // step 2: compute the sign of the difference
  // sign becomes 0 (if r<=treshold)
  // sign becomes 1 otherwise
  int sign = 1 ^ ((unsigned int)(r-treshold-1) >> (sizeof(int) * CHAR_BIT - 1));
  count+=sign;
  return count;
}

This works for 32 bit integers, adapting to 16 bits should be easy. It compiles using g++.

The speed depends on the used processor. Branching might be faster after all.

DirkM 2010-10-27 15:22:35

Right-shifting negative numbers is implementation-defined.

Oli Charlesworth 2010-10-27 15:24:03

From the bit twiddling hacks website: On March 7, 2003, Angus Duggan pointed out that the 1989 ANSI C specification leaves the result of signed right-shift implementation-defined, so on some systems this hack might not work. I've read that ANSI C does not require values to be represented as two's complement, so it may not work for that reason as well (on a diminishingly small number of old machines that still use one's complement). So it depends on how portable the OP wanted the question to be answered.

DirkM 2010-10-27 15:28:36

@Oli, you're right that right-shifting negative numbers is implementation defined. If you find a compiler that does not implement this as replication of sigificant bits (e.g. what everyone expects) I'll send you a bootle of wine.. (no, compilers written by yourself don't apply)

Nils Pipenbrinck 2010-10-27 20:32:59

@Nils: No, I can't think of one! But I know that some processors don't have an arithmetic shift instruction, so I could imagine that a compiler for such a platform may not bother with manual sign-extension (which must take at least a few extra cycles).

Oli Charlesworth 2010-10-28 09:27:17

Answer 6

+1 A:

Oli Charlesworth, I think, has the right idea. However, I suspect that it can be further optimized (at the expense of readability).

The threshold can be normalized to zero to remove a comparison.

That is, ...

counter += ((unsigned) (i + threshhold)  < (unsigned) (threshhold + threshhold));

Sparky 2010-10-27 15:26:21

Either of these additions could overflow.

Oli Charlesworth 2010-10-27 15:27:13

Oli is right but it's easily fixed. Cast to `unsigned` before the addition and then it's fine. Since the original values fit in `signed int`, it will work fine.

R.. 2010-10-27 15:34:21

@R: On systems using two's-complement arithmetic, casting a negative int to unsigned will add (UINT_MAX+1) to it, but I believe the standard explicitly allows for systems to use sign+magnitude format, in which case the cast would subtract the value from ((UINT_MAX+1)/2). Unfortunately, I don't know of any guaranteed-portable way to add a possibly-negative value to an unsigned value when the sum may lie between INT_MAX and UINT_MAX.

supercat 2010-10-27 16:53:35

@supercat Beware, R.. seems really touchy on the subject of casting negative signed ints to unsigned. But he is right. It is defined by the standard: it adds UINT_MAX+1 until the number is in the correct range. It is the cast from unsigned to signed that is implementation-defined.

Pascal Cuoq 2010-10-27 18:04:45

Answer 7

+3 A:

There is a standard idiom for range-checking with a single comparison instruction. It goes like:

(unsigned)x - a <= (unsigned)b - a   /* a <= x <= b */
(unsigned)x - a < (unsigned)b - a    /* a <= x < b */

As a common example (this version if isdigit is guaranteed to be correct by the standard):

(unsigned)ch - '0' < 10

If your original type is larger than int (for instance long long) then you will need to use larger unsigned types (for instance unsigned long long). If a and b are constants or already have unsigned type, or if you know b-a will not overflow, you can omit the cast from b.

In order for this method to work, naturally you must have a<=b and the types/values must be such that the original expression (i.e. a <= x && x <= b or similar) behaves mathematically correctly. For instance if x were signed and b unsigned, x<=b could evaluate to false when x=-1 and b=UINT_MAX-1. As long as your original types are all signed or smaller than the unsigned type you cast to, this is not an issue.

As for how this "trick" works, it is purely determining, after reduction modulo UINT_MAX+1, whether x-a lies in the range 0 to b-a.

In your case, I think the following should work just fine:

(unsigned)i + threshold > 2U * threshold;

If threshold does not change between loop iterations, the compiler can probably keep both threshold and 2U*threshold in registers.

Speaking of optimizations, a good compiler should optimize your original range test to use unsigned arithmetic where it knows the constraints are met. I suspect many do so with a and b constant, but perhaps not with more complex expressions. Even if the compiler can optimize it, though, the (unsigned)x-a<b-a idiom is still extremely useful in macros where you want to ensure that x is evaluated exactly once.

R.. 2010-10-27 16:06:17

Answer 8

+2 A:

Oh, too bad the question has already been answered. To paraphrase Oli's answer, the code

#include <stdint.h>
int main()
{
    int32_t threshold_square = 100;
    int16_t i = 20;
    int16_t counter = 0;

    counter += ( (int32_t) i * i > threshold_square);

    return 0;
}

yields the following x86 assembler using GCC without optimizations

pushq   %rbp
movq    %rsp, %rbp
movl    $100, -8(%rbp)
movw    $20, -2(%rbp)
movw    $0, -4(%rbp)
movswl  -2(%rbp),%edx
movswl  -2(%rbp),%eax
imull   %edx, %eax
cmpl    -8(%rbp), %eax
setg    %al
movzbl  %al, %edx
movzwl  -4(%rbp), %eax
leal    (%rdx,%rax), %eax
movw    %ax, -4(%rbp)
movl    $0, %eax
leave
ret

which is four instructions less than using (i < -threshold) | (i > threshold).

Whether this is better or not is, of course, depending on the architecture.

(The use of stdint.h is for illustrative purposes, for strict C89 replace with whatever is relevant for the target system.)

Christoffer 2010-10-27 20:17:27

+1: I totally didn't think of this. Nice (and in hindsight, obvious) approach!

Oli Charlesworth 2010-10-28 09:22:54

As much as this is correct and more optimal than Oli's method, one advantage of his method (and its variants appearing in other answers) is that it is easy extend it to check for asymmetric range, while here the range is always symmetric.

ysap 2010-10-29 05:46:10

Answer 9

A:

What is wrong with the original code? Does it really need hand-optimising?

Any decent compiler should be able to optimise that very well. Any hand-optimising would probably only lead to obfuscation.

xlq 2010-10-27 20:55:17

Answer 10

+1 A:

This code have no branch an highly portable (however, implementation of abs may have one).

#include <stdlib.h>
counter += abs(i) > threshold;

That's simplest standard compliant expression.

If your compiler does not using optimized macro for abs() you may use your own macro/ inline function.

That are examples, that use nature of twos complement format used on most machines:

#define ABS(x) ((x)*(((x)>>15)|1))

#define ABS(x) ((x)-((x)>>15)^((x)>>15))

Also you may replace comparison operator with expression like this:

#define LESS(x, y) (-((x)-(y))>>15))

Resulting code:

counter -= ((threshold - abs(i)) >> 15);

All those macros rely on fact, that shift right to number of bits minus one of positive value or zero evaluates to zero, and of negative evaluates to minus one. But thats implementation defined.

Vovanium 2010-10-27 23:39:46

ansaurus

tags:

views:

answers:

How to optimize this line of C code (checking range)?

related questions