"Copy and paste coding" (getting bisect
's sources into your code) is not recommended as it carries all sorts of costs down the road (lot of extra source code for you to test and maintain, difficulties dealing with upgrades in the upstream code you've copied, etc, etc); the best way to reuse standard library modules is simply to import them and use them.
However, to do one pass transforming the dictionaries into meaningfully comparable entries is O(N), which (even though each step of the pass is simple) will eventually swamp the O(log N) time of the search proper. Since bisect
can't support a key=
key extractor like sort
does, what the solution to this dilemma -- how can you reuse bisect
by import and call, without a preliminary O(N) step...?
As quoted here, the solution is in David Wheeler's famous saying, "All problems in computer science can be solved by another level of indirection". Consider e.g....:
import bisect
listofdicts = [
{'dt': '2009-%2.2d-%2.2dT12:00:00' % (m,d) }
for m in range(4,9) for d in range(1,30)
]
class Indexer(object):
def __init__(self, lod, key):
self.lod = lod
self.key = key
def __len__(self):
return len(self.lod)
def __getitem__(self, idx):
return self.lod[idx][self.key]
lookfor = listofdicts[len(listofdicts)//2]['dt']
def mid(res=listofdicts, target=lookfor):
keys = [r['dt'] for r in res]
return res[bisect.bisect_left(keys, target)]
def midi(res=listofdicts, target=lookfor):
wrap = Indexer(res, 'dt')
return res[bisect.bisect_left(wrap, target)]
if __name__ == '__main__':
print '%d dicts on the list' % len(listofdicts)
print 'Looking for', lookfor
print mid(), midi()
assert mid() == midi()
The output (just running this indexer.py
as a check, then with timeit
, two ways):
$ python indexer.py
145 dicts on the list
Looking for 2009-06-15T12:00:00
{'dt': '2009-06-15T12:00:00'} {'dt': '2009-06-15T12:00:00'}
$ python -mtimeit -s'import indexer' 'indexer.mid()'
10000 loops, best of 3: 27.2 usec per loop
$ python -mtimeit -s'import indexer' 'indexer.midi()'
100000 loops, best of 3: 9.43 usec per loop
As you see, even in a modest task with 145 entries in the list, the indirection approach can have a performance that's three times better than the "key-extraction pass" approach. Since we're comparing O(N) vs O(log N), the advantage of the indirection approach grows without bounds as N increases. (For very small N, the higher multiplicative constants due to the indirection make the key-extraction approach faster, but this is soon surpassed by the big-O difference). Admittedly, the Indexer class is extra code -- however, it's reusable over ALL tasks of binary searching a list of dicts sorted by one entry in each dict, so having it in your "container-utilities back of tricks" offers good return on that investment.
So much for the main search loop. For the secondary task of converting two entries (the one just below and the one just above the target) and the target to a number of seconds, consider, again, a higher-reuse approach, namely:
import time
adt = '2009-09-10T12:00:00'
def dttosecs(dt=adt):
# string to seconds since the beginning
date,tim = dt.split('T')
y,m,d = date.split('-')
h,mn,s = tim.split(':')
y = int(y)
m = int(m)
d = int(d)
h = int(h)
mn = int(mn)
s = min(59,int(float(s)+0.5)) # round to neatest second
s = int(s)
secs = time.mktime((y,m,d,h,mn,s,0,0,-1))
return secs
def simpler(dt=adt):
return time.mktime(time.strptime(dt, '%Y-%m-%dT%H:%M:%S'))
if __name__ == '__main__':
print adt, dttosecs(), simpler()
assert dttosecs() == simpler()
Here, there is no performance advantage to the reuse approach (indeed, and on the contrary, dttosecs
is faster) -- but then, you only need to perform three conversions per search, no matter how many entries are on your list of dicts, so it's not clear whether that performance issue is germane. Meanwhile, with simpler
you only have to write, test and maintain one simple line of code, while dttosecs
is a dozen lines; given this ratio, in most situations (i.e., excluding absolute bottlenecks), I would prefer simpler
. The important thing is to be aware of both approaches and of the tradeoffs between them so as to ensure the choice is made wisely.