ansaurus

Question

Three-way conditional in c++ to determine sign equivalance of two numbers

Answer 1

+7 A:

Your example doesn't work because you didn't put parenthesis around (x^y)

This is working:

return y? (((x^y) >= 0) ? 1 : -1) : 0;

I think you can't do much faster if you want to return -1, 1 or 0. This is because -1 is 11111111 and is quite different from 0 and 1. A set of bit operations that would return 11111111, 0 or 1 would be complicated and certainly slower than the code above.

EDIT: if instead of -1 and 1 you can cope with any negative or positive number, then you can eliminate a branch

return y ? ((x^y) | 1) : 0;

Tomaka17 2010-07-21 09:12:21

The parentheses in your code aren't matched. Also, the code doesn't seem to return the correct result for x==0.

Martin B 2010-07-21 09:24:33

OP said that x wasn't equal to 0, and my parentheses are matched

Tomaka17 2010-07-21 09:37:06

With the second you get a negative value if the sign is different and a positive value if it is the same, but the value itself can change ; I proposed this since I don't know really know the usage

Tomaka17 2010-07-21 10:03:26

I don't necessarily need return values of -1,0,1 anything is fine as long as there are only three distinct return values.

Justin 2010-07-21 10:16:55

"A set of bit operations that would return 11111111, 0 or 1 would be complicated and certainly slower than the code above." That's so just not true. See drawnonward's solution. A mispredicted jump is more expensive than a boat load of bit operations, so I'll take the branch-free version any day. (Assuming we're talking cycle optimization)

roe 2010-07-21 11:13:40

My intuition would tell me that drawnonward's solution is slower than mine (and Jukka Suomela's one looks faster), but I agree that the best thing to do is to try all the answers of this question and profile them

Tomaka17 2010-07-21 11:17:09

his has 8 trivial operations (no divisions or anything like that) yours has two operations and a branch that'll stall the pipeline in case of a mispredication which, dependending on the data, will happen more or less often. Anyway, your function takes like 2-20 cycles, his takes 8. So if more than one in every three calls causes a misprediction, his wins.

roe 2010-07-21 11:23:55

This depends highly on data, if you use a uniform distribution the branch will only fail 1 on 13 time ; but I must recognize that I was seriously wrong with "I think you can't do much faster"

Tomaka17 2010-07-21 11:35:33

Well it'll go 'the other way' 1 in 13 times, no word as to which is the predicted branch. It might be that it's actually only correct 1 in 13 times... :) Also, Jukka Suomela's solution below has only 6 operations, and no branches.

roe 2010-07-21 12:12:02

Branch prediction is dynamic. After the first branch (which is 50/50), the processor stores the branch result in the cache and uses it as a prediction for the next time. Two false predictions in a row (0.6% chance with uniform distribution) are required to change the prediction but if it happens, performance will drop for 4 branches

Tomaka17 2010-07-21 13:15:23

Modern CPUs have rather sophisticated branch predictors, and not necessarily only using the last result. Older CPUs, or cheaper CPUs, not so much. Some architectures have the initial prediction right there in the op-code (I know PPC does), and some always use that (early SPARCs for example) according to Wikipedia. Either way, as you say, it's dynamic, and if you want performance, dynamic is rarely what you want.

roe 2010-07-22 06:44:54

Answer 2

+12 A:

How about:

int foo(int x,int y)
{
    // As suggested by Luther Blissett below in the comments.
    // Increased the size of the array to 16x16.
    // This allows for simpler optimization for the compiler
    // Also use +8 rather +6 in the hopes that compiler optimization will be easier
    // you never know (there may be some fancy trick.
    static int sign[16][16] = {
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, -1, -1, -1, -1, -1},
                { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
                { -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1},
                { -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1},
                { -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1},
                { -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1},
                { -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1},
                { -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1},
                { -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1}
            };

    return sign[x+8][y+8];
}

This should be fast as there is no branching that will stall the processor.

Using g++ -O3 -S:

__Z3fooii:
  pushl   %ebp
  movl    %esp, %ebp
  movl    8(%ebp), %eax
  movl    12(%ebp), %edx
  popl    %ebp
  sall    $4, %eax
  addl    %edx, %eax
  movl    _ZZ3fooiiE4sign+544(,%eax,4), %eax
  ret

Martin York 2010-07-21 09:18:01

As usual brute force works well for small domains :)

Matthieu M. 2010-07-21 09:37:14

There is a possibility that computing the value would be faster than using the table due to its frequent reloading to the processor cache. Depends on how the calling code is organized

Alsk 2010-07-21 09:37:50

@Alsk: That may be true but definitely something that would need to be measured to be confirmed. But You also have to remeber that if there is any type of conditional (two in the OP question) then a processor pipeline stall is going to happen and that's not fast either (but faster than fetch from main memory to cache).

Martin York 2010-07-21 09:42:29

I've thought of this, but I'm worried that the load instruction for the table index would be even more costly than multiplication.

Justin 2010-07-21 09:52:32

@Justin: You could implement both and time them (or better profile them). From my experience, 'intuitive' detection of bottlenecks is not too reliable.

sum1stolemyname 2010-07-21 10:33:52

@Justin: multiplication by constants is usually compiled to shifts and adds and thus very fast. Also, you can redeclare the sign array as sign[13][16], which means that only <<4 is required to compute the index.

Luther Blissett 2010-07-21 11:03:42

@Luther Blissett : Done

Martin York 2010-09-04 11:36:37

Answer 3

+1 A:

You could do something like this (Only with proper variable names and done much less ugly!) Note that this ONLY works with 2s compliment numbers and if your values are limited to -6 to 6 as in your questions.

Profile it to make sure it's faster than the clear way of doing and ONLY write code like this once you have determined that you can't meet your requirements using a much more obvious approach. with branch prediction etc, branches aren't always slow on x86 for example. I would never write unportable code like this unless I had no choice to meet performance requirements.

Basically extract the sign bits and exclusive or them to get the result you want.

int foo(int x, int y)
{
    int s;

    if (x == 0 || y == 0) return 0;

    x = x >> 4; // Bit 0 of x will be the sign bit of x
    y = y >> 4; // Bit 0 of y will be the sign bit of y

    s = (x ^ y) & 1; // sign is 0 if they have the same sign, 1 otherwise

    return  1 - 2 * s;  // Make it 1 for the same sign, -1 otherwise
}

this compiles on my compiler to a couple of quick tests for zero and what looks like quite an efficient bit of bit maniplation after that...

    test    ecx, ecx
    je  SHORT $LN1@foo
    test    edx, edx
    je  SHORT $LN1@foo
; Line 12
    xor ecx, edx
    mov eax, 1
    sar ecx, 4
    and ecx, 1
    add ecx, ecx
    sub eax, ecx
; Line 13
    ret 0
$LN1@foo:
; Line 5
    xor eax, eax
; Line 13
    ret 0

John Burton 2010-07-21 09:40:01

OP said `x` can't be 0, so you can drop that check.

IVlad 2010-07-21 09:42:35

Well his first paragraph and example seemed to contradict that. If it's not possible, then yes

John Burton 2010-07-21 09:53:59

Answer 4

+6 A:

Edit:

((x*y)>>7) | -(-(x*y)>>7)

Above returns 1 if both are same sign, -1 if both are different signs.
Below returns 1 if both are positive, -1 if both are negative.

Assuming signed 32 bit values. With |x,y|<7 you could shift by 3.

  ((x&y)>>31)  // -1 or 0
-((-x&-y)>>31) //  1 or 0

((x&y)>>31) | -((-x&-y)>>31)

Assuming < is 1 or 0.

-((x&y)<0)     // -1 or 0
((-x&-y)<0)    //  1 or 0

-((x&y)<0) | ((-x&-y)<0)

Either way looks like 8 operations.

drawnonward 2010-07-21 10:07:27

Excellent bit-fiddling, +1. The middle one is my favorite, although I get the feeling it should be possible to shave another operation off of it.. just can't put my finger on it.

roe 2010-07-21 11:15:57

Answer 5

+5 A:

Here is another version (with ugly, non-portable bit manipulation tricks):

int foo(int x, int y) {
    return ((x^y) >> 4) - ((x^(-y)) >> 4);
}

Some explanations:

((x^y) >> 4) is -1 if exactly one of x and y is negative, otherwise it is 0.
((x^(-y)) >> 4) is -1 if exactly one of x and -y is negative, otherwise it is 0.
If x > 0 and y > 0, the result will be 0 - (-1) = 1.
If x < 0 and y < 0, the result will be 0 - (-1) = 1.
If x > 0 and y = 0, the result will be 0 - 0 = 0.
If x < 0 and y = 0, the result will be (-1) - (-1) = 0.
If x > 0 and y < 0, the result will be (-1) - 0 = -1.
If x < 0 and y > 0, the result will be (-1) - 0 = -1.

Assumes two's complement arithmetic and assumes that >> shifts with sign-extension.

Jukka Suomela 2010-07-21 10:38:06

That's better than mine

John Burton 2010-07-21 11:14:35

+1, very elegant. is >> 4 cheaper than >> 31 in any situation, or why >> 4? (except, that we may use it, given the value ranges)

roe 2010-07-21 12:16:08

How is it non-portable by the way? Except for requiring signed shifts and twos complement arithmetic?

roe 2010-07-21 12:18:21

I put >> 4 on mine because it was all that was necessary for the question, and didn't require any assumptions about how big an int was on that platform.

John Burton 2010-07-21 17:35:00

@roe: bit-shifting by any amount takes the same number of clock cycles on x86; I imagine it's the same on most other popular processors as well

BlueRaja - Danny Pflughoeft 2010-07-21 22:40:39

@Martin - For every C or C++ compiler I've ever used, right shift zero-fills if you right-shift an *unsigned* type, but fills with the sign for a *signed* type. I'm not 100% sure that's mandated by the standards, but I'd be surprised if it's not. That is, the C shifts are arithmetic shifts, but an arithmetic shift on an unsigned type (where there is no sign bit to copy) is equivalent to a logical shift. Java invented >>> because Java doesn't have unsigned integer types. Bad solution IMO - I can never remember which is meant to be the logical rather than arithmetic shift.

Steve314 2010-07-21 23:33:26

@BlueRaja - That's certainly true for current desktop chips, but barrel shifters requiring one or more clocks per bit are cheaper in silicon. They were used a lot back in ye olden days, and I wouldn't be surprised to find them on *really* minimal embedded processors etc even now.

Steve314 2010-07-21 23:40:34

@Steve, @Martin: The C++ standard leaves the behavior implementation defined if the value is signed and negative, but requires zeros for signed non-negative values. So in C++ at least, sign extension is always a possibility.

Dennis Zickefoose 2010-07-22 01:28:53

Answer 6

A:

To express the sign of the number x as an "normalized" integer (i.e. -1, 0, +1) use

inline int sign(int x) { return (x > 0) - (x < 0); }

Deriving from the above, to compare x and y for sign equality use

inline bool same_sign(int x, int y) { 
  return sign(x) == sign(y);
}

for boolean result.

Or, for -1, 0, +1 result

inline int compare_sign(int x, int y) { 
  return sign(x) * sign(y);
}

How efficient you final code will be depends, of course, on the quality of the compiler you are using.

AndreyT 2010-07-21 22:23:34

ansaurus

tags:

views:

answers:

Three-way conditional in c++ to determine sign equivalance of two numbers

related questions