ansaurus

Question

Branchless code that maps zero, negative, and positive to 0, 1, 2

Answer 1

+10 A:

int Compare(int x, int y) {
     return (x < y) + (y < x) << 1;
}

Edit: Bitwise only? Guess < and > don't count, then?

int Compare(int x, int y) {
    int diff = x - y;
    return (!!diff) | (!!(diff & 0x80000000) << 1);
}

But there's that pesky -.

Edit: Shift the other way around.

Meh, just to try again:

int Compare(int x, int y) {
    int diff = y - x;
    return (!!diff) << ((diff >> 31) & 1);
}

But I'm guessing there's no standard ASM instruction for !!. Also, the << can be replaced with +, depending on which is faster...

Bit twiddling is fun!

Hmm, I just learned about setnz.

I haven't checked the assembler output (but I did test it a bit this time), and with a bit of luck it could save a whole instruction!:

IN THEORY. MY ASSEMBLER IS RUSTY

subl  %edi, %esi
setnz %eax
sarl  $31, %esi
andl  $1, %esi
sarl  %eax, %esi
mov   %esi, %eax
ret

Rambling is fun.

I need sleep.

Tordek 2009-10-23 00:44:37

Doesn't work:Compare(5, 9) -> 3 should be 1Compare(3, 1) -> 1 should be 2Compare(4, 4) -> 0 (correct)Here's my version:int Compare(int x, int y){ unsigned int diff = x - y; return (!(diff >> 31) }Can you do better? :)

Marc Eaddy 2009-10-23 05:31:01

Oops, sorry, I hadn't tested and didn't cross my mind `!!diff` would be 1 for a negative diff, for some reason.

Tordek 2009-10-23 07:31:28

There's an ASM instruction for `!!` if there's an ASM instruction for `!= 0` (which I bet there is).

Chris Lutz 2009-10-23 08:00:13

Yup, `setnz` does the trick.

Tordek 2009-10-23 08:12:39

Why bitwise only? Generating a `bool` value from a comparison is a branchless operation these days.

AndreyT 2009-10-23 08:12:50

+1 for the `setnz` trick (but I can't add twice...). Wonder whether it's an optimized instruction on recent x86 processors though.

Abel 2009-10-23 11:25:08

@AndreyT: Beats me, I'm only sticking to OP's rules.

Tordek 2009-10-23 16:39:24

Answer 2

+10 A:

Assuming 2s complement, arithmetic right shift, and no overflow in the subtraction:

#define SHIFT (CHARBIT*sizeof(int) - 1)

int Compare(int x, int y)
{
    int diff = x - y;
    return -(diff >> SHIFT) - (((-diff) >> SHIFT) << 1);
}

Stephen Canon 2009-10-23 00:49:21

Tordek 2009-10-23 00:54:48

Abel 2009-10-23 01:49:39

Stephen Canon 2009-10-23 04:13:42

You're also assuming sizeof(int) == 4.

Richard Corden 2009-10-23 09:11:12

`#include <limits.h>` and replace `31` with `CHAR_BIT * sizeof(int) - 1`

ephemient 2009-10-23 14:42:12

If the explicit extraction of sign bit is the most efifcient way to check the sign of a number (as opposed to `test`-like operations and CPU state flags), then that's the code any self-respecting compiler will generate for language-level comparisons with 0.

AndreyT 2009-10-23 15:58:43

Answer 3

+6 A:

Two's complement:

#include <limits.h>
#define INT_BITS (CHAR_BITS * sizeof (int))

int Compare(int x, int y) {
    int d = y - x;
    int p = (d + INT_MAX) >> (INT_BITS - 1);
    d = d >> (INT_BITS - 2);
    return (d & 2) + (p & 1);
}

Assuming a sane compiler, this will not invoke the comparison hardware of your system, nor is it using a comparison in the language. To verify: if x == y then d and p will clearly be 0 so the final result will be zero. If (x - y) > 0 then ((x - y) + INT_MAX) will set the high bit of the integer otherwise it will be unset. So p will have its lowest bit set if and only if (x - y) > 0. If (x - y) < 0 then its high bit will be set and d will set its second to lowest bit.

Paul Hsieh 2009-10-23 01:10:04

The Bit Twiddling Hacks Page guy (a page which I keep referring people to)? Wow!

dirkgently 2009-10-23 01:56:39

Who me? Glad to be of service. Actually the only reason I am here is because I need the sense of affirmation for getting all the badges. :)

Paul Hsieh 2009-10-23 02:10:51

Bug? Cmp(5,9) should be 1 and Cmp(9,5) should be 2

Marc Eaddy 2009-10-23 06:57:27

Right so change the difference line to: int d = y - x;

Paul Hsieh 2009-10-23 08:31:16

Answer 4

+3 A:

I'm siding with Tordek's original answer:

int compare(int x, int y) {
    return (x < y) + 2*(y < x);
}

Compiling with gcc -O3 -march=pentium4 results in branch-free code that uses conditional instructions setg and setl (see this explanation of x86 instructions).

push   %ebp
mov    %esp,%ebp
mov    %eax,%ecx
xor    %eax,%eax
cmp    %edx,%ecx
setg   %al
add    %eax,%eax
cmp    %edx,%ecx
setl   %dl
movzbl %dl,%edx
add    %edx,%eax
pop    %ebp
ret

Tom 2009-10-23 01:44:31

As I said in other comment, it is a pity that it still dosn't understand that a single `cmp %edx,%ecx` is sufficient.

AndreyT 2009-10-23 08:11:11

Answer 5

A:

Combining Stephen Canon and Tordek's answers:

int Compare(int x, int y)
{
    int diff = x - y; 
    return -(diff >> 31) + (2 & (-diff >> 30));
}

Yields: (g++ -O3)

subl     %esi,%edi 
movl     %edi,%eax
sarl     $31,%edi
negl     %eax
sarl     $30,%eax
andl     $2,%eax
subl     %edi,%eax 
ret

Tight! However, Paul Hsieh's version has even fewer instructions:

subl     %esi,%edi
leal     0x7fffffff(%rdi),%eax
sarl     $30,%edi
andl     $2,%edi
shrl     $31,%eax
leal     (%rdi,%rax,1),%eax
ret

Marc Eaddy 2009-10-23 06:05:20

Note that fewer instructions != faster. Only profiling the examples being run a few million times will tell.

Chris Lutz 2009-10-23 06:32:08

I'd say that the `2 * (x >y) + (x < y)` version looks cleaner in C code, produces few instructions and embeds no immediate constants in machine code. Unfortunately, neither GCC nor MSVC yet understand that it is enough to `cmp` x and y only once, even at the highest optimization levels.

AndreyT 2009-10-23 08:03:38

Answer 6

+16 A:

Branchless (at the language level) code that maps negative to -1, zero to 0 and positive to +1 looks as follows

int c = (n > 0) - (n < 0);

if you need a different mapping you can simply use an explicit map to remap it

const int MAP[] = { 1, 0, 2 };
int c = MAP[(n > 0) - (n < 0) + 1];

or, for the requested mapping, use some numerical trick like

int c = 2 * (n > 0) + (n < 0);

(It is obviously very easy to generate any mapping from this as long as 0 is mapped to 0. And the code is quite readable. If 0 is mapped to something else, it becomes more tricky and less readable.)

As an additinal note: comparing two integers by subtracting one from another at C language level is a flawed technique, since it is generally prone to overflow. The beauty of the above methods is that they can immedately be used for "subtractionless" comparisons, like

int c = 2 * (x > y) + (x < y);

AndreyT 2009-10-23 06:16:18

+1 for the array example.

Richard Corden 2009-10-23 09:13:15

I like the sheer conciseness and readability of what you've done here. The only flaw I see is that not all languages or environments will return consistent values (or numeric values) from a comparison (i.e. you can't use this in Java).

jprete 2009-10-23 15:28:23

Funny. It was my first answer, and I discarded it on the bitwise argument.

Tordek 2009-10-23 21:44:28

Yes, I saw that you submitted the same thing before me. I upvoted your answer and the one from Tom right away. Now I'm trying to downvote mine, but it doesn't let me :)

AndreyT 2009-10-23 21:47:42

Depending on the instruction set, this isn't necessarily branchless.

peterchen 2009-10-23 23:32:55

Yes, that's what I meant when I said that it is branchless *at the language level*.

AndreyT 2009-10-24 05:51:52

If you are going for performance, that multiplication by 2 could be written with bit shifts, if x and y are integers

Tom 2009-10-26 03:29:31

Any compiler worth anyone's time will do that for you.

GMan 2009-10-26 03:35:36

@Gman: So true, so true.

Tom 2009-10-26 03:45:06

@Tom: Multiplication by 2 and shift 1 bit left are operations that have exactly the same semantical meaning in C/C++. For this reason, they produce identical machine code. There's absolutely no point to replace one with another. It will absolutely have no effect on performance. The choice must be made from pure readability considerations. In this case I wanted to multiply something by 2, not shift anything anywhere, so that's exactly how I spelled it out in the code: with multiplication operator and with a `2`.

AndreyT 2009-10-26 05:20:46

@AudreyT: Even though it's "branchless" at the language level, anytime you use comparison operators, you are potentially introducing branches at the machine compilation level.

Adisak 2009-10-26 06:22:48

@Adisak: Yes, I understand that perfectly well, which is, obviously, the reason I included the "language level" remark in my original reply. So?

AndreyT 2009-10-26 06:35:44

AudreyT: "So?" Well, the reason it's important is the Branchless code is almost always harder to read than Branching code... even in the simple case you present with `int c = 2 * (x > y) + (x < y);` It's mainly used as part of an advanced programmers toolkit as a micro-optimization in heavily used critical routines or inner loops. Since your "Branchless at the language level" example emits two branches at the machine code level, it's no longer a valid optimization. Your example will run slower than a naive easy-to-read branching version.

Adisak 2009-10-30 02:59:47

Firstly, it doesn't necessarily emit branches. It only *might* emit branches, and with a more-or-less decent compiler there will only be one branch, not two. Or it might not emit any branches at all. I consider the latter as much more likely, BTW. Secondly, I don't see why it should run "slower" instead of running with the same speed as the explictly branched version.

AndreyT 2009-10-30 03:57:35

Thirdly, code like `cmp = (x > y) - (x < y)` is a well-known and extremely widespread comparison idiom. For even a moderately experienced programmer this version is significantly more readable than the explicitly branched version, since it is much more compact.

AndreyT 2009-10-30 03:59:27

Answer 7

A:

int Compare(int x, int y)
{
    int diff = x - y;

    int absdiff = 0x7fffffff & diff; // diff with sign bit 0
    int absdiff_not_zero = (int) (0 != udiff);

    return
        (absdiff_not_zero << 1)      // 2 iff abs(diff) > 0
        -
        ((0x80000000 & diff) >> 31); // 1 iff diff < 0
}

Alex 2009-10-23 14:30:49

This will not work correctly in cases where the computation of `diff = x-y` overflows. Also, some compilers will convert the `!=` you use to compute `absdiff_not_zero` to a branch.

Adisak 2009-10-26 07:22:06

Ah. Quite right on the overflow.

Alex 2009-10-26 13:24:43

Answer 8

+1 A:

Good god, this has haunted me.

Whatever, I think I squeezed out a last drop of performance:

int compare(int a, int b) {
    return (a != b) << (a > b);
}

Although, compiling with -O3 in GCC will give (bear with me I'm doing it from memory)

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
cmpl  %esi, %edi
setgt %dl
sall  %dl, %eax
ret

But the second comparison seems (according to a tiny bit of testing; I suck at ASM) to be redundant, leaving the small and beautiful

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
setgt %dl
sall  %dl, %eax
ret

(Sall may totally not be an ASM instruction, but I don't remember exactly)

So... if you don't mind running your benchmark once more, I'd love to hear the results (mine gave a 3% improvement, but it may be wrong).

Tordek 2009-10-26 03:23:37

Why wouldn't you just add al and dl to get the value of 1 or 2 instead of using SALL (shift arithmetic left long)? Add is simpler and faster than variable shift (which may be microcoded on some CPUs).

Adisak 2009-10-26 07:17:47

It's a fair option; I hadn't though of it. Only benchmarks will tell.

Tordek 2009-10-26 17:04:14

ICC 11 produces (icc -static -g -O3 -fast -xHost):`[mov $0x1,%edx;xor %ecx,%ecx;xor %eax,%eax;cmp %esi,%edi;cmovne %edx,%eax;cmovg %edx,%ecx;shl %cl,%eax;retq]My benchmark indicates it is very fast.

Marc Eaddy 2009-10-26 17:20:44

Answer 9

+3 A:

Unsigned Comparison that returns -1,0,1 (cmpu) is one of the cases that is tested for by the GNU SuperOptimizer.

cmpu: compare (unsigned)
int cmpu(unsigned_word v0, unsigned_word v1)
{
    return ( (v0 > v1) ? 1 : ( (v0 < v1) ? -1 : 0) );
}

A SuperOptimizer exhaustively searches the instruction space for the best possible combination of instructions that will implement a given function. It is suggested that compilers automagically replace the functions above by their superoptimized versions (although not all compilers do this). For example, in the PowerPC Compiler Writer's Guide (powerpc-cwg.pdf), the cmpu function is shown as this in Appendix D pg 204:

cmpu: compare (unsigned)
PowerPC SuperOptimized Version
subf  R5,R4,R3
subfc R6,R3,R4
subfe R7,R4,R3
subfe R8,R7,R5

That's pretty good isn't it... just four subtracts (and with carry and/or extended versions). Not to mention it is genuinely branchfree at the machine opcode level. There is probably a PC / Intel X86 equivalent sequence that is similarly short since the GNU Superoptimizer runs for X86 as well as PowerPC.

Note that Unsigned Comparison (cmpu) can be turned into Signed Comparison (cmps) on a 32-bit compare by adding 0x80000000 to both Signed inputs before passing it to cmpu.

cmps: compare (signed)
int cmps(signed_word v0, signed_word v1)
{
    signed_word offset=0x80000000;
    return ( (unsigned_word) (v0 + signed_word),
        (unsigned_word) (v1 + signed_word) );
}

This is just one option though... the SuperOptimizer may find a cmps that is shorter and does not have to add offsets and call cmpu.

To get the version that you requested that returns your values of {1,0,2} rather than {-1,0,1} use the following code which takes advantage of the SuperOptimized cmps function.

int Compare(int x, int y)
{
    static const int retvals[]={1,0,2};
    return (retvals[cmps(x,y)+1]);
}

Adisak 2009-10-26 06:52:44

ansaurus

tags:

views:

answers:

Branchless code that maps zero, negative, and positive to 0, 1, 2

IN THEORY. MY ASSEMBLER IS RUSTY

related questions