ansaurus

Question

Answer 1

+7 A:

int mid = (low + high) >>> 1;

Be advised that using "(low + high) / 2" for midpoint calculations won't work correctly when integer overflow becomes an issue.

Jeff Moser 2009-06-19 21:52:42

+1 for the right answer, but I do have to comment, "a lot of binary searches are broken" is a bit sensationalist. More like "a lot of binary search implementations contain an overflow bug that occurs with large numbers of items."

Not Sure 2009-06-19 22:00:21

Looks like I updated the text just before you made this comment. Is it better now? :)

Jeff Moser 2009-06-19 22:01:46

is this actually faster?

Jimmy 2009-06-19 22:02:14

I'm with Jimmy on this one... I'd expect the compiler to be making this optimization already. I'd be very surprised if it's faster.

rmeador 2009-06-19 22:11:41

Shouldn't be any faster with a good compiler, but the fastest way to tell is to try it.

UncleO 2009-06-19 22:13:06

And isn't it only >> ?

UncleO 2009-06-19 22:14:25

Question doesn't specify language. Not all languages are compiled. Though how anyone could talk about "faster" without even specifying language, I do not know... Personally, I would happily yell at any compiler-writer who didn't optimise division-by-static-constant-2 to right-shift-1, assuming *unsigned* ints. Right shift of negative values is tricksy in C, and in a binary chop I doubt you need it, so use an unsigned int in case it helps.

Steve Jessop 2009-06-19 22:27:33

In case anyone's wondering: in C/C++, right shift of negative values might be a logical, not arithmetic shift, in which case it plain doesn't work for division. Furthermore, even with arithmetic shift, -3 >> 1 == -2, which isn't the answer you want. If your type is int, then the compiler doesn't know your values are always positive, so it can't just replace division with shift even if you know that it would work.

Steve Jessop 2009-06-19 22:31:16

Language is C#. I'll benchmark on Monday to see if I gained anything with the change.

Nick 2009-06-19 22:44:58

@uncleo, yeah Java has >>> but C# doesn't.

Nosredna 2009-06-20 02:00:35

After benchmarking, this has significant (4x) performance gain over the division. I used ANTS profiler.

Nick 2009-06-22 12:53:47

Answer 2

+4 A:

You can use bit shifting and also overcome a possible overflow issue:

low + ((high-low) >> 1)

However I must admit I expect modern compilers and interpreters to do division by 2 (or division by any other constant power of 2) as bit-shifting, so not sure if it will really help - try it out.

Roee Adler 2009-06-19 22:40:33

Answer 3

+1 A:

I'm skeptical that the finding of the midpoint is the bottleneck here. An integer division by two should be compiled to a right shift. Are high, low, and mid declared as integers?

I'd love to see your whole binary search routine. I think we're missing something here.

Nosredna 2009-06-19 22:57:27

It is certainly curious that there is that big of a skew - so I'd be interested to see the whole routine too.

Jonathan Leffler 2009-06-19 23:46:19

@Jonathan Leffler--Yeah, it doesn't make sense to me. Either something's accidentally declared or cast to a float rather than an integer, or the profile is is misleading, or there's just not much in the binary search and there's not much to squeeze out. I'd like to see the compiled output of the routine, too. If the implementation is iterative, the conditionals should be more expensive than the average. If it's recursive, the function call overhead should be more expensive. And doesn't C# have an Array.BinarySearch method already?

Nosredna 2009-06-19 23:57:15

@Nosredna: I wonder if maybe the JIT compiler has done something clever, with the result that the arithmetic is being smushed into the compare-and-branch, and the profiler can't tell which source line it really is. I don't know enough about .NET to judge whether that's plausible.

Steve Jessop 2009-06-20 00:51:38

@onebyone.livejournal.com--that's a good idea. I've certainly seen cases where profiles didn't make sense until I looked at the object code generated by the compiler.

Nosredna 2009-06-20 01:14:13

Answer 4

A:

If I recall correctly, there are some cases where using the exact middle of the array can actually be slower. The solution is to randomize the choice of the index where you bisect the array. Equally true of the algorithm for determining the median of an array.

I can't recall the precise details, but I remember hearing in lecture 6 of the MIT algorithms series on iTunes.

duffymo 2009-06-20 01:16:19

I'd worry about getting too tricky. You want to make sure that you hit them all at the end. Maybe just randomize if the distance between high and low is greater than a certain number.

Nosredna 2009-06-20 01:20:45

I'd check out a good algorithms book to see the details.

duffymo 2009-06-20 01:41:31

Answer 5

+4 A:

Here is a bit-hack version of the average that does not suffer from the overflow problem:

unsigned int average (unsigned int x, unsigned int y)
{
  return (x&y)+((x^y)>>1);
}

Nils Pipenbrinck 2009-06-20 01:47:15

Wow. I LOVE that! Of course, I would expect the add, subtract, and shift of the usual solution to be at least as fast. But you've scored major coolness points there!

Nosredna 2009-06-20 02:12:27

IMHO on a modern pc the performance differences between the versions does not makes a difference anymore.

Nils Pipenbrinck 2009-06-20 02:58:54

@Nils: Yes indeed, on modern CPUs it is the unpredictable branches of a binary search that are the speed killers.

Zan Lynx 2009-11-13 05:15:48

ansaurus

tags:

views:

answers:

Fast average without division

related questions