views:

437

answers:

6

I'd like to store some data in Python in a similar form to a dictionary: {1:'a', 2:'b'}. Every value will be unique, not just among other values, but among keys too.

Is there a simple data structure that I can use to get the corresponding object no matter if I ask using the 'key' or the 'value'? For example:

>>> a = {1:'a', 2:'b'}
>>> a[1]
'a'
>>> a['b']
2
>>> a[3]
KeyError

The 'keys' are standard python ints, an the values are short (<256char) strings.

My current solution is creating a reversed dictionary and searching it if I can't find a result in the original dictionary:

pointsreversed = dict((v, k) for k, v in points.iteritems())
def lookup(key):
    return points.get(key) or pointsreversed.key()

This uses twice as much space, which isn't great (my dictionaries can be up to a few hundred megs) and is 50% slower on average.

EDIT: as mentioned in a few answers, two dicts doesn't double memory usage, as it's only the dictionary, not the items within, that is duplication.

Is there a solution that improves on this?

+5  A: 

Related posts:

Python mapping inverse

Python 1:1 mappings

Of course, if all values and keys are unique, couldn't you just use a single dictionary, and insert both key:value and value:key initially?

John Pirie
Yeah, if all the keys and values are unique, you /could/ use a single dictionary. Hadn't thought of that. +1
Rick Copeland
Very clever idea, and thanks especially for the second link.
Alex Jurkiewicz
He could, depending on what else he wanted to do ... e.g. single_dict.items() and friends could cause problems and/or excessive use of isinstance()
John Machin
A: 

Insert reversed pair of (key, value) into same dict:

a = {1:'a', 2:'b'}
a.update(dict((v, k) for k, v in a.iteritems()))

Then you will be able to do both, as you required:

print a[1]
print a['a']
mtasic
+6  A: 

If your keys and values are non-overlapping, one obvious approach is to simply store them in the same dict. ie:

class BidirectionalDict(dict):
    def __setitem__(self, key, val):
        dict.__setitem__(self, key, val)
        dict.__setitem__(self, val, key)

    def __delitem__(self, key):
        dict.__delitem__(self, self[key])
        dict.__delitem__(self, key)

d = BidirectionalDict()
d['foo'] = 4
print d[4]   # Prints 'foo'

(You'll also probably want to implement things like the __init__, update and iter* methods to act like a real dict, depending on how much functionality you need).

This should only involve one lookup, though may not save you much in memory (you still have twice the number of dict entries after all). Note however that neither this nor your original will use up twice as much space: the dict only takes up space for the references (effectively pointers), plus an overallocation overhead. The space taken up by your data itself will not be repeated twice since the same objects are pointed to.

Brian
A: 

Here's another solution using a user defined class.

And the code...

# search a dictionary for key or value
# using named functions or a class
# tested with Python25 by Ene Uran 01/19/2008

def find_key(dic, val):
    """return the key of dictionary dic given the value"""
    return [k for k, v in symbol_dic.iteritems() if v == val][0]

def find_value(dic, key):
    """return the value of dictionary dic given the key"""
    return dic[key]

class Lookup(dict):
    """
    a dictionary which can lookup value by key, or keys by value
    """
    def __init__(self, items=[]):
        """items can be a list of pair_lists or a dictionary"""
        dict.__init__(self, items)

    def get_key(self, value):
        """find the key(s) as a list given a value"""
        return [item[0] for item in self.items() if item[1] == value]

    def get_value(self, key):
        """find the value given a key"""
        return self[key]
tgray
But in that case, you do not directly access to a value, since you need to look for it.. It reduces the interest of the dictionnary
ThibThib
+1  A: 

In The Art of Computer Programming, Vokume 3 Knuth has a section on lookups of secondary keys. For purposes of your question, the value could be considered the secondary key.

The first suggestion is to do what you have done: make an efficient index of the keys by value.

The second suggestion is to setup a large btree that is a composite index of the clustered data, where the branch nodes contain values and the leaves contain the key data and pointers to the larger record (if there is one.)

If the data is geometric (as yours appears to be) there are things called post-office trees. It can answer questions like, what is the nearest object to point x. A few examples are here: http://simsearch.yury.name/russir/01nncourse-hand.pdf Another simple option for this kind of query is the quadtree and the k-d tree. http://en.wikipedia.org/wiki/Quadtree

Another final option is combinatorial hashing, where you combine the key and value into a special kind of hash that lets you do efficient lookups on the hash, even when you don't have both values. I couldn't find a good combinatorial hash explanation online, but it is in TAoCP, Volume 3 Second Edition on page 573.

Granted, for some of these you may have to write your own code. But if memory or performance is really key, you might want to take the time.

Christopher
+1  A: 

It shouldn't use "twice the space". Dictionaries just store references to data, not the data itself. So, if you have a million strings taking up a billion bytes, then each dictionary takes maybe an extra 10-20 million bytes--a tiny fraction of the overall storage. Using two dictionaries is the right thing to do.