views:

97

answers:

4

Hi

So here is my problem. I want to store 2-tuple (key, val) and want to perform following operations:

  • keys are strings and values are Integers
  • multiple keys can have same value
  • adding new tuples
  • updating any key with new value (any new value or updated value is greater than the previous one, like timestamps)
  • fetching all the keys with values less than or greater than given value
  • deleting tuples.

Hash seems to be the obvious choice for updating the key's value but then lookups via values will be going to take longer (O(n)). The other option is balanced binary search tree with key and value switched. So now lookups via values will be fast (O(lg(n))) but updating a key will take (O(n)). So is there any data-structure which can be used to address these issues?

Thanks.

A: 

For binary Search Tree Insert is O(logN) operation in average and O(n) in worst case. The same for lookup operation. So this should be your choice I believe.

Hun1Ahpu
A: 

Dictionary or Map types tend to be based on one of two structures.

  • Balanced tree (guarantee O(log n) lookup).
  • Hash based (best case is O(1), but a poor hash function for the data could result in O(n) lookups).

Any book on algorithms should cover both in lots of detail.

To provide operations both on keys and values, there are also multi-index based collections (with all the extra complexity) which maintain multiple structures (much like an RDBMS table can have multiple indexes). Unless you have a lot of lookups over a large collection the extra overhead might be a higher cost than a few linear lookups.

Richard
+1  A: 

I'd use 2 datastructures, a hash table from keys to values and a search tree ordered by values and then by keys. When inserting, insert the pair into both structures, when deleting by key, look up the value from the hash and then remove the pair from the tree. Updating is basically delete+insert. Insert, delete and update are O(log n). For fetching all the keys less than a value lookup the value in the search tree and iterate backwards. This is O(log n + k).

The choices for good hash table and search tree implementations depend a lot on your particular distribution of data and operations. That said, a good general purpose implementation of both should be sufficient.

Ants Aasma
A: 

You can create a custom data structure which holds two dictionaries.

i.e a hash table from keys->values and another hash table from values->lists of keys.

class Foo:
    def __init__(self):
        self.keys = {} # (KEY=key,VALUE=value)
        self.values = {} # (KEY=value,VALUE=list of keys)

    def add_tuple(self,kd,vd):
        self.keys[kd] = vd
        if self.values.has_key(vd):
           self.values[vd].append(kd)
        else:
            self.values[vd] = [kd]

f = Foo()
f.add_tuple('a',1)
f.add_tuple('b',2)
f.add_tuple('c',3)
f.add_tuple('d',3)

print f.keys
print f.values

print f.keys['a']
print f.values[3]

print [f.values[v] for v in f.values.keys() if v > 1]

OUTPUT:

{'a': 1, 'c': 3, 'b': 2, 'd': 3}

{1: ['a'], 2: ['b'], 3: ['c', 'd']}

1

['c', 'd']

[['b'], ['c', 'd']]
TheMachineCharmer