views:

8159

answers:

6

Is there a library function that performs binary search on a list/tuple and return the position of the item if found and 'False' (-1, None, etc.) if not?

I found the functions bisect_left/right in the bisect module, but they still return a position even if the item is not in the list. That's perfectly fine for their intended usage, but I just want to know if an item is in the list or not (don't want to insert anything).

I thought of using bisect_left and then checking if the item at that position is equal to what I'm searching, but that seems cumbersome (and I also need to do bounds checking if the number can be larger than the largest number in my list). If there is a nicer method I'd like to know about it.

Edit To clarify what I need this for: I'm aware that a dictionary would be very well suited for this, but I'm trying to keep the memory consumption as low as possible. My intended usage would be a sort of double-way look-up table. I have in the table a list of values and I need to be able to access the values based on their index. And also I want to be able to find the index of a particular value or None if the value is not in the list.

Using a dictionary for this would be the fastest way, but would (approximately) double the memory requirements.

I was asking this question thinking that I may have overlooked something in the Python libraries. It seems I'll have to write my own code, as Moe suggested.

+17  A: 

Why not look at the code for bisect_left/right and adapt it to suit your purpose.

like this:

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1
Moe
I originally +1'ed this, but now I've come to the conclusion this isn't a good thing. If this answer is followed, it'll cause a lot of code duplication, and as we all know, it's really simple to f*ck up binary search.
abyx
+1  A: 

If you just want to see if it's present, try turning the list into a dict:

# Generate a list
l = [n*n for n in range(1000)]

# Convert to dict - doesn't matter what you map values to
d = dict((x, 1) for x in l)

count = 0
for n in range(1000000):
    # Compare with "if n in l"
    if n in d:
        count += 1

On my machine, "if n in l" took 37 seconds, while "if n in d" took 0.4 seconds.

jrb
``set(l)`` could do the job.
J.F. Sebastian
That's not always a good option for a couple of reasons: 1) dicts/sets take up more memory. 2) if he doesn't have much in the list, a binary search may be faster. 3) converting the list to a dict is an O(n) operation while a binary search is O(log n).
Jason Baker
As an FYI, the "set" overhead in python compared to python lists, is very very low. And they are extremely fast for lookups. Where binary search really excels is for looking up ranges.
Gregg Lind
Converting the list may be O(n) but sorting the data in the list, which you'd have to do before binary searching it, is worse. Where's the data coming from, you can probably insert it into a dictionary as you go. I agree that the memory may be an issue.
Mark Baker
+14  A: 

This is a little off-topic (since Moe's answer seems complete to the OP's question), but it might be worth looking at the complexity for your whole procedure from end to end. If you're storing thing in a sorted lists (which is where a binary search would help), and then just checking for existence, you're incurring (worst-case, unless specified):

Sorted Lists

  • O( n log n) to initially create the list (if it's unsorted data, or O(n) )
  • O( log n) lookup
  • O( n ) insert / delete (might be O(1) or O(log n) average case, depending on your pattern)

Whereas with a set, you're incurring

  • O(n) to create
  • O(1) lookup
  • O(1) insert / delete

The thing a sorted list really gets you are "next", "previous", and "ranges" (including inserting or deleting ranges), which are O(1) or O(|range|), given a starting index. If you aren't using those sorts of operations often, then storing as sets, and sorting for display might be a better deal overall. set() incurs very little additional overhead in python.

Gregg Lind
There is one other thing a sorted list gets you. O(n) ordered traversal. With a set that's O(n log n) and you end up having to copy references to the data into a list.
Omnifarious
True enough! Thank you for expanding on what I meant by range search. Fwiw, a full traversal is the same a range query between min,max, which is O(k) where k = n :)
Gregg Lind
+1  A: 

Using a dict wouldn't like double your memory usage unless the objects you're storing are really tiny, since the values are only pointers to the actual objects:

>>> a = 'foo'
>>> b = [a]
>>> c = [a]
>>> b[0] is c[0]
True

In that example, 'foo' is only stored once. Does that make a difference for you? And exactly how many items are we talking about anyway?

Just Some Guy
It's about numbers and lots of them :) I'd like to use an array almost as big as the computer memory. I know the base of my problem could be wrong, but I was curious about the lack of a binary search method.
rslite
You can't have a key object small enough to qualify as "really tiny" here. An object would have a minimum cost of 3 words (type, refcount, payload), while a list adds 1 word, a set adds 1 word, and a dict adds 2 words.All three (list/set/dict) preallocate space in some fashion as well, which is another multiplier, but still not enough to matter.
Rhamphoryncus
+5  A: 

Simplest is to use bisect and check one position back to see if the item is there:

def binary_search(a,x,lo=0,hi=-1):
    i = bisect(a,x,lo,hi)
    if i == 0:
        return -1
    elif a[i-1] == x:
        return i-1
    else:
        return -1
Imran
Nice, but the code barfs if you do not pass in the 'hi' value. I'd write it like this: "def binary_search(a,x,lo=0,hi=None): from bisect import bisect i = bisect(a,x,lo,hi or len(a)) return (i-1 if a[i-1] == x else -1)" and test it like this: "for i in range(1, 20): a = list(range(i)) for aa in a: j = binary_search(a, aa) if j != aa: print i, aa, j"
hughdbrown
@hughdbrown, your answer is broken, too. Think about what happens when `i` turns out to be zero.
Dave Abrahams
+2  A: 
from bisect import bisect_left

def binary_search(a, x, lo=0, hi=None):   # can't use a to specify default for hi
    hi = hi if hi is not None else len(a) # hi defaults to len(a)   
    pos = bisect_left(a,x,lo,hi)          # find insertion position
    return (pos if pos != hi and a[pos] == x else -1) # don't walk off the end
Dave Abrahams