views:

350

answers:

8

I have a problem which requires a reversable 1:1 mapping of keys to values.

That means sometimes I want to find the value given a key, but at other times I want to find the key given the value. Both keys and values are guaranteed unique.

x = D[y]
y == D.inverse[x]

The obvious solution is to simply invert the dictionary every time I want a reverse-lookup: Inverting a dictionary is very easy, there's a recipe here but for a large dictionary it can be very slow.

The other alternative is to make a new class which unites two dictionaries, one for each kind of lookup. That would most likely be fast but would use up twice as much memory as a single dict.

So is there a better structure I can use?

  • My application requires that this should be very fast and use as little as possible memory.
  • The structure must be mutable, and it's strongly desirable that mutating the object should not cause it to be slower (e.g. to force a complete re-index)
  • We can guarantee that either the key or the value (or both) will be an integer
  • It's likely that the structure will be needed to store thousands or possibly millions of items.
  • Keys & Valus are guaranteed to be unique, i.e. len(set(x)) == len(x) for for x in [D.keys(), D.valuies()]
+4  A: 

The other alternative is to make a new class which unites two dictionaries, one for each > kind of lookup. That would most likely use up twice as much memory as a single dict.

Not really, since they would just be holding two references to the same data. In my mind, this is not a bad solution.

Have you considered an in-memory database lookup? I am not sure how it will compare in speed, but lookups in relational databases can be very fast.

Shane C. Mason
The 2-dicts class is the best so far!
Salim Fadhley
A: 

Assuming that you have a key with which you look up a more complex mutable object, just make the key a property of that object. It does seem you might be better off thinking about the data model a bit.

Pete Kirkham
In this case I cannot - the objects on one side are numpy.int64s - the purpose of the application is to adapt a very austere, numeric graph-theory class to something that looks more naturally pythonic.
Salim Fadhley
In that case, a flyweight would do.
Pete Kirkham
+16  A: 

The other alternative is to make a new class which unites two dictionaries, one for each kind of lookup. That would most likely be fast but would use up twice as much memory as a single dict.

Not really. Have you measured that? Since both dictionaries would use references to the same objects as keys and values, then the memory spent would be just the dictionary structure. That's a lot less than twice and is a fixed ammount regardless of your data size.

What I mean is that the actual data wouldn't be copied. So you'd spend little extra memory.

Example:

a = "some really really big text spending a lot of memory"

number_to_text = {1: a}
text_to_number = {a: 1}

Only a single copy of the "really big" string exists, so you end up spending just a little more memory. That's generally affordable.

I can't imagine a solution where you'd have the key lookup speed when looking by value, if you don't spend at least enough memory to store a reverse lookup hash table (which is exactly what's being done in your "unite two dicts" solution).

nosklo
I think this is a good solution. However, you would be doubling the overhead of maintaining a dict (memory and computation) because now there are two. I suspect this overhead would be small compared to the rest of the problem.
Doug
@Doug: you are trading the overhead of maintaning a second dict, by the speed of near O(1) lookups on it. I can't see another approach that wouldn't duplicate the effort.
nosklo
Tom
By the way, let me add that this is not an order of magnitude of difference in space overhead, and it *is* an order of magnitude of difference in time saved. The space of the dict keeps things at O(n)... the reverse lookup without a second dict is also O(n), but is improved to O(1)... Now this is all theoretical and we probably care about the constant factors - but I'm just saying this seems like a fair and good tradeoff to me.
Tom
+1  A: 

"We can guarantee that either the key or the value (or both) will be an integer"

That's weirdly written -- "key or the value (or both)" doesn't feel right. Either they're all integers, or they're not all integers.

It sounds like they're all integers.

Or, it sounds like you're thinking of replacing the target object with an integer value so you only have one copy referenced by an integer. This is a false economy. Just keep the target object. All Python objects are -- in effect -- references. Very little actual copying gets done.

Let's pretend that you simply have two integers and can do a lookup on either one of the pair. One way to do this is to use heap queues or the bisect module to maintain ordered lists of integer key-value tuples.

See http://docs.python.org/library/heapq.html#module-heapq

See http://docs.python.org/library/bisect.html#module-bisect

You have one heapq (key,value) tuples. Or, if your underlying object is more complex, the (key,object) tuples.

You have another heapq (value,key) tuples. Or, if your underlying object is more complex, (otherkey,object) tuples.

An "insert" becomes two inserts, one to each heapq-structured list.

A key lookup is in one queue; a value lookup is in the other queue. Do the lookups using bisect(list,item).

S.Lott
It was a fairly clear statement: at least one of items in each key/value pairs will be an integer and sometimes both will be integers.
Shane C. Mason
Why the round-about statement? Why not a positive list of what data types are involved? The logic may be clear, but it's useless for algorithm design. The "either-or" is usually an exclusive or. But the "or both" means it's an inclusive or. Which means ANY combination of types (except 2 non-integers) would be valid. Making it a hard thing to optimize.
S.Lott
A: 

It so happens that I find myself asking this question all the time (yesterday in particular). I agree with the approach of making two dictionaries. Do some benchmarking to see how much memory it's taking. I've never needed to make it mutable, but here's how I abstract it, if it's of any use:

class BiDict(list):
    def __init__(self,*pairs):
     super(list,self).__init__(pairs)
     self._first_access = {}
     self._second_access = {}
     for pair in pairs:
      self._first_access[pair[0]] = pair[1]
      self._second_access[pair[1]] = pair[0]
      self.append(pair)

    def _get_by_first(self,key):
     return self._first_access[key]

    def _get_by_second(self,key):
     return self._second_access[key]

    # You'll have to do some overrides to make it mutable
    # Methods such as append, __add__, __del__, __iadd__
    # to name a few will have to maintain ._*_access

class Constants(BiDict):
    # An implementation expecting an integer and a string
    get_by_name = BiDict._get_by_second
    get_by_number = BiDict._get_by_first

t = Constants(
     ( 1, 'foo'),
     ( 5, 'bar'),
     ( 8, 'baz'),
    )

>>> print t.get_by_number(5)
bar
>>> print t.get_by_name('baz')
8
>>> print t
[(1, 'foo'), (5, 'bar'), (8, 'baz')]
David Berger
+1  A: 

How about using sqlite? Just create a :memory: database with a two-column table. You can even add indexes, then query by either one. Wrap it in a class if it's something you're going to use a lot.

ShawnMilo
depending on the requirements, using a DB to do this lookup may cost more in terms of memory and cpu cycles than a double dict.
Chii
A: 
class TwoWay:
    def __init__(self):
       self.d = {}
    def add(self, k, v):
       self.d[k] = v
       self.d[v] = k
    def remove(self, k):
       self.d.pop(self.d.pop(k))
    def get(self, k):
       return self.d[k]
A: 

Here is my own solution to this problem: http://github.com/spenthil/pymathmap/blob/master/pymathmap.py

The goal is to make it as transparent to the user as possible. The only introduced significant attribute is partner.

OneToOneDict subclasses from dict - I know that isn't generally recommended, but I think I have the common use cases covered. The backend is pretty simple, it (dict1) keeps a weakref to a 'partner' OneToOneDict (dict2) which is its inverse. When dict1 is modified dict2 is updated accordingly as well and vice versa.

From the docstring:

>>> dict1 = OneToOneDict()
>>> dict2 = OneToOneDict()
>>> dict1.partner = dict2
>>> assert(dict1 is dict2.partner)
>>> assert(dict2 is dict1.partner)
>>> dict1['one'] = '1'
>>> dict2['2'] = '1'
>>> dict1['one'] = 'wow'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1['one'] = '1'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1.update({'three': '3', 'four': '4'})
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict3 = OneToOneDict({'4':'four'})
>>> assert(dict3.partner is None)
>>> assert(dict3 == {'4':'four'})
>>> dict1.partner = dict3
>>> assert(dict1.partner is not dict2)
>>> assert(dict2.partner is None)
>>> assert(dict1.partner is dict3)
>>> assert(dict3.partner is dict1)
>>> dict1.setdefault('five', '5')
>>> dict1['five']
'5'
>>> dict1.setdefault('five', '0')
>>> dict1['five']
'5'

When I get some free time, I intend to make a version that doesn't store things twice. No clue when that'll be though :)

spenthil