views:

120

answers:

3

With respect to an array containing a lot of duplicate elements, is there any operations to improve the performance of normal binary search?

+8  A: 

You can create two arrays. One for values, and the other for repetitions. Then, you can search the values array using binary search.

AraK
I racked my brains over this one for some minutes; clever answer, congrats! +1.
Carl Smotricz
+1  A: 

AraK's answer is 'the only' answer. In general you can't improve worst-case performance on any array with duplicates. Let's say your duplicates tend to concentrate on the end of the array. Now you're looking for an element towards the beginning: your search will have to run as usual.

If you know that duplicates tend to appear on one side more often, you can skew the splitting to improve average performance. If they appear towards the end, e.g., you can do the first split at 1/3 instead of 1/2.

Mau
+1  A: 

Similar to AraK's answer, you could use an array of size-2 tuples in languages that support them, with the first element of each tuple being the searched-for value and the second one being the number of repetitions, or you could use a two-dimensional array to serve a similar purpose.

Some languages even have features that allow you to take an array of elements and create a counted list. For example, Python 3.1+ has a Counter class that does exactly that, though unfortunately the data type it returns, a subclass of Python's dict builtin, is an unordered collection, though you could just use l = list(Counter([collection of values]).items()); l.sort() to create a sorted list of tuples like the one I first mentioned. I'm sure other languages have similar constructs.

EDIT: Note that, since the dict type is implemented as a hashtable/hashmap, it might actually be easier to check items (if you're, say, checking whether or not an item exists in the list) using the Counter object itself, as that would avoid the cost of the conversion and sort, and would only cost slightly more for each search/check. If you have a Counter (let's call it "c"), you could do c[value] > 0, which would return True if the value is in your original list and False if it isn't.

JAB