views:

1374

answers:

9

Write a branchless function that returns 0, 1, or 2 if the difference between two signed integers is zero, negative, or positive.

Here's a version with branching:

int Compare(int x, int y)
{
    int diff = x - y;
    if (diff == 0)
        return 0;
    else if (diff < 0)
        return 1;
    else
        return 2;
}

Here's a version that may be faster depending on compiler and processor:

int Compare(int x, int y)
{
    int diff = x - y;
    return diff == 0 ? 0 : (diff < 0 ? 1 : 2);
}

Can you come up with a faster one without branches?

SUMMARY

The 10 solutions I benchmarked had similar performance. The actual numbers and winner varied depending on compiler (icc/gcc), compiler options (e.g., -O3, -march=nocona, -fast, -xHost), and machine. Canon's solution performed well in many benchmark runs, but again the performance advantage was slight. I was surprised that in some cases some solutions were slower than the naive solution with branches.

+10  A: 
int Compare(int x, int y) {
     return (x < y) + (y < x) << 1;
}

Edit: Bitwise only? Guess < and > don't count, then?

int Compare(int x, int y) {
    int diff = x - y;
    return (!!diff) | (!!(diff & 0x80000000) << 1);
}

But there's that pesky -.

Edit: Shift the other way around.

Meh, just to try again:

int Compare(int x, int y) {
    int diff = y - x;
    return (!!diff) << ((diff >> 31) & 1);
}

But I'm guessing there's no standard ASM instruction for !!. Also, the << can be replaced with +, depending on which is faster...

Bit twiddling is fun!

Hmm, I just learned about setnz.

I haven't checked the assembler output (but I did test it a bit this time), and with a bit of luck it could save a whole instruction!:

IN THEORY. MY ASSEMBLER IS RUSTY

subl  %edi, %esi
setnz %eax
sarl  $31, %esi
andl  $1, %esi
sarl  %eax, %esi
mov   %esi, %eax
ret

Rambling is fun.

I need sleep.

Tordek
Doesn't work:Compare(5, 9) -> 3 should be 1Compare(3, 1) -> 1 should be 2Compare(4, 4) -> 0 (correct)Here's my version:int Compare(int x, int y){ unsigned int diff = x - y; return (!(diff >> 31) }Can you do better? :)
Marc Eaddy
Oops, sorry, I hadn't tested and didn't cross my mind `!!diff` would be 1 for a negative diff, for some reason.
Tordek
There's an ASM instruction for `!!` if there's an ASM instruction for `!= 0` (which I bet there is).
Chris Lutz
Yup, `setnz` does the trick.
Tordek
Why bitwise only? Generating a `bool` value from a comparison is a branchless operation these days.
AndreyT
+1 for the `setnz` trick (but I can't add twice...). Wonder whether it's an optimized instruction on recent x86 processors though.
Abel
@AndreyT: Beats me, I'm only sticking to OP's rules.
Tordek
+10  A: 

Assuming 2s complement, arithmetic right shift, and no overflow in the subtraction:

#define SHIFT (CHARBIT*sizeof(int) - 1)

int Compare(int x, int y)
{
    int diff = x - y;
    return -(diff >> SHIFT) - (((-diff) >> SHIFT) << 1);
}
Stephen Canon
Tordek
Abel
Stephen Canon
You're also assuming sizeof(int) == 4.
Richard Corden
`#include <limits.h>` and replace `31` with `CHAR_BIT * sizeof(int) - 1`
ephemient
If the explicit extraction of sign bit is the most efifcient way to check the sign of a number (as opposed to `test`-like operations and CPU state flags), then that's the code any self-respecting compiler will generate for language-level comparisons with 0.
AndreyT
+6  A: 

Two's complement:

#include <limits.h>
#define INT_BITS (CHAR_BITS * sizeof (int))

int Compare(int x, int y) {
    int d = y - x;
    int p = (d + INT_MAX) >> (INT_BITS - 1);
    d = d >> (INT_BITS - 2);
    return (d & 2) + (p & 1);
}

Assuming a sane compiler, this will not invoke the comparison hardware of your system, nor is it using a comparison in the language. To verify: if x == y then d and p will clearly be 0 so the final result will be zero. If (x - y) > 0 then ((x - y) + INT_MAX) will set the high bit of the integer otherwise it will be unset. So p will have its lowest bit set if and only if (x - y) > 0. If (x - y) < 0 then its high bit will be set and d will set its second to lowest bit.

Paul Hsieh
The Bit Twiddling Hacks Page guy (a page which I keep referring people to)? Wow!
dirkgently
Who me? Glad to be of service. Actually the only reason I am here is because I need the sense of affirmation for getting all the badges. :)
Paul Hsieh
Bug? Cmp(5,9) should be 1 and Cmp(9,5) should be 2
Marc Eaddy
Right so change the difference line to: int d = y - x;
Paul Hsieh
+3  A: 

I'm siding with Tordek's original answer:

int compare(int x, int y) {
    return (x < y) + 2*(y < x);
}

Compiling with gcc -O3 -march=pentium4 results in branch-free code that uses conditional instructions setg and setl (see this explanation of x86 instructions).

push   %ebp
mov    %esp,%ebp
mov    %eax,%ecx
xor    %eax,%eax
cmp    %edx,%ecx
setg   %al
add    %eax,%eax
cmp    %edx,%ecx
setl   %dl
movzbl %dl,%edx
add    %edx,%eax
pop    %ebp
ret
Tom
As I said in other comment, it is a pity that it still dosn't understand that a single `cmp %edx,%ecx` is sufficient.
AndreyT
A: 

Combining Stephen Canon and Tordek's answers:

int Compare(int x, int y)
{
    int diff = x - y; 
    return -(diff >> 31) + (2 & (-diff >> 30));
}

Yields: (g++ -O3)

subl     %esi,%edi 
movl     %edi,%eax
sarl     $31,%edi
negl     %eax
sarl     $30,%eax
andl     $2,%eax
subl     %edi,%eax 
ret

Tight! However, Paul Hsieh's version has even fewer instructions:

subl     %esi,%edi
leal     0x7fffffff(%rdi),%eax
sarl     $30,%edi
andl     $2,%edi
shrl     $31,%eax
leal     (%rdi,%rax,1),%eax
ret
Marc Eaddy
Note that fewer instructions != faster. Only profiling the examples being run a few million times will tell.
Chris Lutz
I'd say that the `2 * (x >y) + (x < y)` version looks cleaner in C code, produces few instructions and embeds no immediate constants in machine code. Unfortunately, neither GCC nor MSVC yet understand that it is enough to `cmp` x and y only once, even at the highest optimization levels.
AndreyT
+16  A: 

Branchless (at the language level) code that maps negative to -1, zero to 0 and positive to +1 looks as follows

int c = (n > 0) - (n < 0);

if you need a different mapping you can simply use an explicit map to remap it

const int MAP[] = { 1, 0, 2 };
int c = MAP[(n > 0) - (n < 0) + 1];

or, for the requested mapping, use some numerical trick like

int c = 2 * (n > 0) + (n < 0);

(It is obviously very easy to generate any mapping from this as long as 0 is mapped to 0. And the code is quite readable. If 0 is mapped to something else, it becomes more tricky and less readable.)

As an additinal note: comparing two integers by subtracting one from another at C language level is a flawed technique, since it is generally prone to overflow. The beauty of the above methods is that they can immedately be used for "subtractionless" comparisons, like

int c = 2 * (x > y) + (x < y);
AndreyT
+1 for the array example.
Richard Corden
I like the sheer conciseness and readability of what you've done here. The only flaw I see is that not all languages or environments will return consistent values (or numeric values) from a comparison (i.e. you can't use this in Java).
jprete
Funny. It was my first answer, and I discarded it on the bitwise argument.
Tordek
Yes, I saw that you submitted the same thing before me. I upvoted your answer and the one from Tom right away. Now I'm trying to downvote mine, but it doesn't let me :)
AndreyT
Depending on the instruction set, this isn't necessarily branchless.
peterchen
Yes, that's what I meant when I said that it is branchless *at the language level*.
AndreyT
If you are going for performance, that multiplication by 2 could be written with bit shifts, if x and y are integers
Tom
Any compiler worth anyone's time will do that for you.
GMan
@Gman: So true, so true.
Tom
@Tom: Multiplication by 2 and shift 1 bit left are operations that have exactly the same semantical meaning in C/C++. For this reason, they produce identical machine code. There's absolutely no point to replace one with another. It will absolutely have no effect on performance. The choice must be made from pure readability considerations. In this case I wanted to multiply something by 2, not shift anything anywhere, so that's exactly how I spelled it out in the code: with multiplication operator and with a `2`.
AndreyT
@AudreyT: Even though it's "branchless" at the language level, anytime you use comparison operators, you are potentially introducing branches at the machine compilation level.
Adisak
@Adisak: Yes, I understand that perfectly well, which is, obviously, the reason I included the "language level" remark in my original reply. So?
AndreyT
AudreyT: "So?" Well, the reason it's important is the Branchless code is almost always harder to read than Branching code... even in the simple case you present with `int c = 2 * (x > y) + (x < y);` It's mainly used as part of an advanced programmers toolkit as a micro-optimization in heavily used critical routines or inner loops. Since your "Branchless at the language level" example emits two branches at the machine code level, it's no longer a valid optimization. Your example will run slower than a naive easy-to-read branching version.
Adisak
Firstly, it doesn't necessarily emit branches. It only *might* emit branches, and with a more-or-less decent compiler there will only be one branch, not two. Or it might not emit any branches at all. I consider the latter as much more likely, BTW. Secondly, I don't see why it should run "slower" instead of running with the same speed as the explictly branched version.
AndreyT
Thirdly, code like `cmp = (x > y) - (x < y)` is a well-known and extremely widespread comparison idiom. For even a moderately experienced programmer this version is significantly more readable than the explicitly branched version, since it is much more compact.
AndreyT
A: 
int Compare(int x, int y)
{
    int diff = x - y;

    int absdiff = 0x7fffffff & diff; // diff with sign bit 0
    int absdiff_not_zero = (int) (0 != udiff);

    return
        (absdiff_not_zero << 1)      // 2 iff abs(diff) > 0
        -
        ((0x80000000 & diff) >> 31); // 1 iff diff < 0
}
Alex
This will not work correctly in cases where the computation of `diff = x-y` overflows. Also, some compilers will convert the `!=` you use to compute `absdiff_not_zero` to a branch.
Adisak
Ah. Quite right on the overflow.
Alex
+1  A: 

Good god, this has haunted me.

Whatever, I think I squeezed out a last drop of performance:

int compare(int a, int b) {
    return (a != b) << (a > b);
}

Although, compiling with -O3 in GCC will give (bear with me I'm doing it from memory)

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
cmpl  %esi, %edi
setgt %dl
sall  %dl, %eax
ret

But the second comparison seems (according to a tiny bit of testing; I suck at ASM) to be redundant, leaving the small and beautiful

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
setgt %dl
sall  %dl, %eax
ret

(Sall may totally not be an ASM instruction, but I don't remember exactly)

So... if you don't mind running your benchmark once more, I'd love to hear the results (mine gave a 3% improvement, but it may be wrong).

Tordek
Why wouldn't you just add al and dl to get the value of 1 or 2 instead of using SALL (shift arithmetic left long)? Add is simpler and faster than variable shift (which may be microcoded on some CPUs).
Adisak
It's a fair option; I hadn't though of it. Only benchmarks will tell.
Tordek
ICC 11 produces (icc -static -g -O3 -fast -xHost):`[mov $0x1,%edx;xor %ecx,%ecx;xor %eax,%eax;cmp %esi,%edi;cmovne %edx,%eax;cmovg %edx,%ecx;shl %cl,%eax;retq]My benchmark indicates it is very fast.
Marc Eaddy
+3  A: 

Unsigned Comparison that returns -1,0,1 (cmpu) is one of the cases that is tested for by the GNU SuperOptimizer.

cmpu: compare (unsigned)
int cmpu(unsigned_word v0, unsigned_word v1)
{
    return ( (v0 > v1) ? 1 : ( (v0 < v1) ? -1 : 0) );
}

A SuperOptimizer exhaustively searches the instruction space for the best possible combination of instructions that will implement a given function. It is suggested that compilers automagically replace the functions above by their superoptimized versions (although not all compilers do this). For example, in the PowerPC Compiler Writer's Guide (powerpc-cwg.pdf), the cmpu function is shown as this in Appendix D pg 204:

cmpu: compare (unsigned)
PowerPC SuperOptimized Version
subf  R5,R4,R3
subfc R6,R3,R4
subfe R7,R4,R3
subfe R8,R7,R5

That's pretty good isn't it... just four subtracts (and with carry and/or extended versions). Not to mention it is genuinely branchfree at the machine opcode level. There is probably a PC / Intel X86 equivalent sequence that is similarly short since the GNU Superoptimizer runs for X86 as well as PowerPC.

Note that Unsigned Comparison (cmpu) can be turned into Signed Comparison (cmps) on a 32-bit compare by adding 0x80000000 to both Signed inputs before passing it to cmpu.

cmps: compare (signed)
int cmps(signed_word v0, signed_word v1)
{
    signed_word offset=0x80000000;
    return ( (unsigned_word) (v0 + signed_word),
        (unsigned_word) (v1 + signed_word) );
}

This is just one option though... the SuperOptimizer may find a cmps that is shorter and does not have to add offsets and call cmpu.

To get the version that you requested that returns your values of {1,0,2} rather than {-1,0,1} use the following code which takes advantage of the SuperOptimized cmps function.

int Compare(int x, int y)
{
    static const int retvals[]={1,0,2};
    return (retvals[cmps(x,y)+1]);
}
Adisak