views:

588

answers:

3

I have a list of dicts, something like this:

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

The dict items are sorted in the list according to the 'offset' data. The real data could be much longer.

What I want to do is lookup an item in the list given a particular offset value, which is not exactly one of those values, but in that range. So, a binary search is what I want to do.

I am now aware of the Python bisect module, which is a ready-made binary search—great, but not directly usable for this case. I'm just wondering what is the easiest way to adapt bisect to my needs. Here is what I came up with:

import bisect

class dict_list_index_get_member(object):
    def __init__(self, dict_list, member):
        self.dict_list = dict_list
        self.member = member
    def __getitem__(self, index):
        return self.dict_list[index][self.member]
    def __len__(self):
        return self.dict_list.__len__()

test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)

It prints:

2

My question is, is this the best way to do what I want, or is there some other simpler, better way?

+1  A: 

The usual pattern here is similar to sorting by an attribute, decorate, operate, and undecorate. So in this case you'd just need to decorate and then call. However you'd want to avoid doing this since decorate would be O(n) whereas you want this to be O(logn). Therefore I'd consider your method best.

Alex Gaynor
+3  A: 

When you say the real data could be much longer, does that prevent you from keeping a list of offset values on hand?

offset_values = [i['offset'] for i in test_data]
bisect.bisect(offset_values, 1900)

Your method seems fine to me though.

sykora
+3  A: 

What you can do is this

class OffsetWithAttributes( object ):
    def __init__( self, offset, **kw ):
        self.offset= offset
        self.attributes= kw
    def __eq__( self, other ):
        return self.offset == other.offset
    def __lt__( self, other ):
        return self.offset < other.offset
    def __le__( self, other ):
        return self.offset <= other.offset
    def __gt__( self, other ):
        return self.offset > other.offset
    def __ge__( self, other ):
        return self.offset >= other.offset
    def __ne__( self, other ):
        return self.offset != other.offset

This should allow you to create a simple list of OffsetWithAttributes instances. The bisect algorithm should be perfectly happy to use the defined operators.

You can use your someOWA.attributes['data'].

Or

    def __getattr__( self, key ):
        return self.attributes[key]

That should make the OffsetWithAttributes more like a dict.

S.Lott