You're scanning through all n records. You could instead do a binary search, which would be O(log(n)) instead of O(n). You can use the bisect
module to do this.
Since apparently you don't care about the ending value of self.sorted_records
actually being sorted (you have values in order 1, 45, 20, 76 -- that's NOT sorted!-), AND you only appear to care about IDs in updated_records
that are also in self.sorted_data
, a listcomp (with side effects if you want to change the updated_record on the fly) would serve you well, i.e.:
self.sorted_data = [(updated_records.pop(recid, value), recid)
for (value, recid) in self.sorted_data]
the .pop
call removes from updated_records
the keys (and corresponding values) that are ending up in the new self.sorted_data
(and the "previous value for that recid
", value
, is supplied as the 2nd argument to pop to ensure no change where a recid is NOT in updated_record
); this leaves in updated_record
the "new" stuff so you can e.g append it to self.sorted_data
before re-sorting, i.e I suspect you want to continue with something like
self.sorted_data.extend(value, recid
for recid, value in updated_records.iteritems())
self.sorted_data.sort()
though this part DOES go beyond the question you're actually asking (and I'm giving it only because I've seen your previous questions;-).
Since you want to replace by dictionary key, but have the array sorted by dictionary value, you definitely need a linear search for the key. In that sense, your algorithm is the best you can hope for.
If you would preserve the old dictionary value, then you could use a binary search for the value, and then locate the key in the proximity of where the binary search lead you.
You'd probably be best served by some form of tree here (preserving sorted order while allowing O(log n) replacements.) There is no builtin balanaced tree type, but you can find many third party examples. Alternatively, you could either:
Use a binary search to find the node. The bisect module will do this, but it compares based on the normal python comparison order, whereas you seem to be sorted based on the second element of each tuple. You could reverse this, or just write your own binary search (or simply take the code from bisect_left and modify it)
Use both a dict and a list. The list contains the sorted keys only. You can wrap the dict class easily enough to ensure this is kept in sync. This allows you fast dict updating while maintaining sort order of the keys. This should prevent your problem of losing sort performance due to constant conversion between dict/list.
Here's a quick implementation of such a thing:
import bisect
class SortedDict(dict):
"""Dictionary which is iterable in sorted order.
O(n) sorted iteration
O(1) lookup
O(log n) replacement ( but O(n) insertion or new items)
"""
def __init__(self, *args, **kwargs):
dict.__init__(self, *args, **kwargs)
self._keys = sorted(dict.iterkeys(self))
def __setitem__(self, key, val):
if key not in self:
# New key - need to add to list of keys.
pos = bisect.bisect_left(self._keys, key)
self._keys.insert(pos, key)
dict.__setitem__(self, key, val)
def __delitem__(self, key):
if key in self:
pos = bisect.bisect_left(self._keys, key)
del self._keys[pos]
dict.__delitem__(self, key)
def __iter__(self):
for k in self._keys: yield k
iterkeys = __iter__
def iteritems(self):
for k in self._keys: yield (k, self[k])
def itervalues(self):
for k in self._keys: yield self[k]
def update(self, other):
dict.update(self, other)
self._keys = sorted(dict.iterkeys(self)) # Rebuild (faster if lots of changes made - may be slower if only minor changes to large dict)
def keys(self): return list(self.iterkeys())
def values(self): return list(self.itervalues())
def items(self): return list(self.iteritems())
def __repr__(self):
return "%s(%s)" % (self.__class__.__name__, ', '.join("%s=%r" % (k, self[k]) for k in self))
@Alex:
What I am trying to do is remove the existing entries in self.sorted_data with a key that exists in updated_records. updated_records will only contain records with ids that already exist in self.sorted_data. What I want to do (if it is the best approach) is remove all of the tuples (t) in self.sorted_data that have t[1] existing in updated_records. Then, I will just append all of the updated_records to the end of self.sorted_data and resort.
Your method above with the list comprehension assumes that updated_records contains both new and updated values and strips the ones that have an id already listed in self.sorted_data. This would be very handy if updated_records contained both, but it doesn't.
Given
self.sorted_records = [(1, 1234567890), (20, 1245678903),
(40, 1256789034), (70, 1278903456)]
I would pass in
updated_records = {1245678903:45, 1278903456:76}
new_records = {10000000:10, 10000001:20}
and end up with
self.sorted_records = [(1, 1234567890), (40, 1256789034), (45, 1245678903), (76, 1278903456), (10,10000000), (20, 10000001)]
before the resort and
self.sorted_records = [(1, 1234567890), (10,10000000), (20, 10000001), (40, 1256789034), (45, 1245678903), (76, 1278903456)]
after