tags:

views:

449

answers:

5
+2  A: 

You're scanning through all n records. You could instead do a binary search, which would be O(log(n)) instead of O(n). You can use the bisect module to do this.

Laurence Gonsalves
Isn't `bisect` only for inserting into already sorted arrays? How would one use it to do a search?
Evan Fosmark
bisect is for searching an array, with insertion being a common use case. It's just a binary search; it's a trickier algorithm to get right in all cases than many people realize, so it makes sense to have it in the standard library.
Glenn Maynard
Note thought that the list seems to be sorted on the **second** item. bisect uses normal comparison, which will give the wrong result in this case.
Brian
I can actually modify the code to have the current tuples (data, id) inserted as (id, data) and just change the key param to use itemgetter(1) when sorting.
sberry2A
A: 

Since apparently you don't care about the ending value of self.sorted_records actually being sorted (you have values in order 1, 45, 20, 76 -- that's NOT sorted!-), AND you only appear to care about IDs in updated_records that are also in self.sorted_data, a listcomp (with side effects if you want to change the updated_record on the fly) would serve you well, i.e.:

self.sorted_data = [(updated_records.pop(recid, value), recid) 
                    for (value, recid) in self.sorted_data]

the .pop call removes from updated_records the keys (and corresponding values) that are ending up in the new self.sorted_data (and the "previous value for that recid", value, is supplied as the 2nd argument to pop to ensure no change where a recid is NOT in updated_record); this leaves in updated_record the "new" stuff so you can e.g append it to self.sorted_data before re-sorting, i.e I suspect you want to continue with something like

self.sorted_data.extend(value, recid 
                        for recid, value in updated_records.iteritems())
self.sorted_data.sort()

though this part DOES go beyond the question you're actually asking (and I'm giving it only because I've seen your previous questions;-).

Alex Martelli
Correct you are Alex. I will be sorting the data after the updates, and the list IS sorted on the tuples index 1. I will also look into using the bisect module for any assistance it can provide.
sberry2A
A: 

Since you want to replace by dictionary key, but have the array sorted by dictionary value, you definitely need a linear search for the key. In that sense, your algorithm is the best you can hope for.

If you would preserve the old dictionary value, then you could use a binary search for the value, and then locate the key in the proximity of where the binary search lead you.

Martin v. Löwis
+1  A: 

You'd probably be best served by some form of tree here (preserving sorted order while allowing O(log n) replacements.) There is no builtin balanaced tree type, but you can find many third party examples. Alternatively, you could either:

  1. Use a binary search to find the node. The bisect module will do this, but it compares based on the normal python comparison order, whereas you seem to be sorted based on the second element of each tuple. You could reverse this, or just write your own binary search (or simply take the code from bisect_left and modify it)

  2. Use both a dict and a list. The list contains the sorted keys only. You can wrap the dict class easily enough to ensure this is kept in sync. This allows you fast dict updating while maintaining sort order of the keys. This should prevent your problem of losing sort performance due to constant conversion between dict/list.

Here's a quick implementation of such a thing:

import bisect

class SortedDict(dict):
    """Dictionary which is iterable in sorted order.

    O(n) sorted iteration
    O(1) lookup
    O(log n) replacement  ( but O(n) insertion or new items)
    """

    def __init__(self, *args, **kwargs):
        dict.__init__(self, *args, **kwargs)
        self._keys = sorted(dict.iterkeys(self))

    def __setitem__(self, key, val):
        if key not in self:
            # New key - need to add to list of keys.
            pos = bisect.bisect_left(self._keys, key)
            self._keys.insert(pos, key)
        dict.__setitem__(self, key, val)

    def __delitem__(self, key):
        if key in self:
            pos = bisect.bisect_left(self._keys, key)
            del self._keys[pos]
        dict.__delitem__(self, key)

    def __iter__(self):
        for k in self._keys: yield k
    iterkeys = __iter__

    def iteritems(self):
        for k in self._keys: yield (k, self[k])

    def itervalues(self):
        for k in self._keys: yield self[k]

    def update(self, other):
        dict.update(self, other)
        self._keys = sorted(dict.iterkeys(self)) # Rebuild (faster if lots of changes made - may be slower if only minor changes to large dict)

    def keys(self): return list(self.iterkeys())
    def values(self): return list(self.itervalues())
    def items(self): return list(self.iteritems())

    def __repr__(self):
        return "%s(%s)" % (self.__class__.__name__, ', '.join("%s=%r" % (k, self[k]) for k in self))
Brian
A: 

@Alex:

What I am trying to do is remove the existing entries in self.sorted_data with a key that exists in updated_records. updated_records will only contain records with ids that already exist in self.sorted_data. What I want to do (if it is the best approach) is remove all of the tuples (t) in self.sorted_data that have t[1] existing in updated_records. Then, I will just append all of the updated_records to the end of self.sorted_data and resort.

Your method above with the list comprehension assumes that updated_records contains both new and updated values and strips the ones that have an id already listed in self.sorted_data. This would be very handy if updated_records contained both, but it doesn't.

Given

self.sorted_records = [(1, 1234567890), (20, 1245678903), 
                       (40, 1256789034), (70, 1278903456)]

I would pass in

updated_records = {1245678903:45, 1278903456:76}
new_records = {10000000:10, 10000001:20}

and end up with

self.sorted_records = [(1, 1234567890), (40, 1256789034), (45, 1245678903), (76, 1278903456), (10,10000000), (20, 10000001)]

before the resort and

self.sorted_records = [(1, 1234567890), (10,10000000), (20, 10000001), (40, 1256789034), (45, 1245678903), (76, 1278903456)]

after

sberry2A