ansaurus

Question

A data-structure for 1:1 mappings in python?

Answer 1

+4 A:

The other alternative is to make a new class which unites two dictionaries, one for each > kind of lookup. That would most likely use up twice as much memory as a single dict.

Not really, since they would just be holding two references to the same data. In my mind, this is not a bad solution.

Have you considered an in-memory database lookup? I am not sure how it will compare in speed, but lookups in relational databases can be very fast.

Shane C. Mason 2009-05-14 15:19:29

The 2-dicts class is the best so far!

Salim Fadhley 2009-05-14 15:26:55

Answer 2

A:

Assuming that you have a key with which you look up a more complex mutable object, just make the key a property of that object. It does seem you might be better off thinking about the data model a bit.

Pete Kirkham 2009-05-14 15:20:47

In this case I cannot - the objects on one side are numpy.int64s - the purpose of the application is to adapt a very austere, numeric graph-theory class to something that looks more naturally pythonic.

Salim Fadhley 2009-05-14 15:29:04

In that case, a flyweight would do.

Pete Kirkham 2009-05-15 14:44:29

Answer 3

+16 A:

The other alternative is to make a new class which unites two dictionaries, one for each kind of lookup. That would most likely be fast but would use up twice as much memory as a single dict.

Not really. Have you measured that? Since both dictionaries would use references to the same objects as keys and values, then the memory spent would be just the dictionary structure. That's a lot less than twice and is a fixed ammount regardless of your data size.

What I mean is that the actual data wouldn't be copied. So you'd spend little extra memory.

Example:

a = "some really really big text spending a lot of memory"

number_to_text = {1: a}
text_to_number = {a: 1}

Only a single copy of the "really big" string exists, so you end up spending just a little more memory. That's generally affordable.

I can't imagine a solution where you'd have the key lookup speed when looking by value, if you don't spend at least enough memory to store a reverse lookup hash table (which is exactly what's being done in your "unite two dicts" solution).

nosklo 2009-05-14 15:34:35

I think this is a good solution. However, you would be doubling the overhead of maintaining a dict (memory and computation) because now there are two. I suspect this overhead would be small compared to the rest of the problem.

Doug 2009-05-14 16:38:35

@Doug: you are trading the overhead of maintaning a second dict, by the speed of near O(1) lookups on it. I can't see another approach that wouldn't duplicate the effort.

nosklo 2009-05-14 17:26:56

Tom 2009-05-14 19:54:16

By the way, let me add that this is not an order of magnitude of difference in space overhead, and it *is* an order of magnitude of difference in time saved. The space of the dict keeps things at O(n)... the reverse lookup without a second dict is also O(n), but is improved to O(1)... Now this is all theoretical and we probably care about the constant factors - but I'm just saying this seems like a fair and good tradeoff to me.

Tom 2009-05-14 19:57:04

Answer 4

+1 A:

"We can guarantee that either the key or the value (or both) will be an integer"

That's weirdly written -- "key or the value (or both)" doesn't feel right. Either they're all integers, or they're not all integers.

It sounds like they're all integers.

Or, it sounds like you're thinking of replacing the target object with an integer value so you only have one copy referenced by an integer. This is a false economy. Just keep the target object. All Python objects are -- in effect -- references. Very little actual copying gets done.

Let's pretend that you simply have two integers and can do a lookup on either one of the pair. One way to do this is to use heap queues or the bisect module to maintain ordered lists of integer key-value tuples.

See http://docs.python.org/library/heapq.html#module-heapq

See http://docs.python.org/library/bisect.html#module-bisect

You have one heapq (key,value) tuples. Or, if your underlying object is more complex, the (key,object) tuples.

You have another heapq (value,key) tuples. Or, if your underlying object is more complex, (otherkey,object) tuples.

An "insert" becomes two inserts, one to each heapq-structured list.

A key lookup is in one queue; a value lookup is in the other queue. Do the lookups using bisect(list,item).

S.Lott 2009-05-14 15:50:46

It was a fairly clear statement: at least one of items in each key/value pairs will be an integer and sometimes both will be integers.

Shane C. Mason 2009-05-14 15:58:13

Why the round-about statement? Why not a positive list of what data types are involved? The logic may be clear, but it's useless for algorithm design. The "either-or" is usually an exclusive or. But the "or both" means it's an inclusive or. Which means ANY combination of types (except 2 non-integers) would be valid. Making it a hard thing to optimize.

S.Lott 2009-05-15 02:51:47

Answer 5

A:

It so happens that I find myself asking this question all the time (yesterday in particular). I agree with the approach of making two dictionaries. Do some benchmarking to see how much memory it's taking. I've never needed to make it mutable, but here's how I abstract it, if it's of any use:

class BiDict(list):
    def __init__(self,*pairs):
     super(list,self).__init__(pairs)
     self._first_access = {}
     self._second_access = {}
     for pair in pairs:
      self._first_access[pair[0]] = pair[1]
      self._second_access[pair[1]] = pair[0]
      self.append(pair)

    def _get_by_first(self,key):
     return self._first_access[key]

    def _get_by_second(self,key):
     return self._second_access[key]

    # You'll have to do some overrides to make it mutable
    # Methods such as append, __add__, __del__, __iadd__
    # to name a few will have to maintain ._*_access

class Constants(BiDict):
    # An implementation expecting an integer and a string
    get_by_name = BiDict._get_by_second
    get_by_number = BiDict._get_by_first

t = Constants(
     ( 1, 'foo'),
     ( 5, 'bar'),
     ( 8, 'baz'),
    )

>>> print t.get_by_number(5)
bar
>>> print t.get_by_name('baz')
8
>>> print t
[(1, 'foo'), (5, 'bar'), (8, 'baz')]

David Berger 2009-05-14 16:12:40

Answer 6

+1 A:

How about using sqlite? Just create a :memory: database with a two-column table. You can even add indexes, then query by either one. Wrap it in a class if it's something you're going to use a lot.

ShawnMilo 2009-05-14 16:29:52

depending on the requirements, using a DB to do this lookup may cost more in terms of memory and cpu cycles than a double dict.

Chii 2009-06-30 12:27:23

Answer 7

A:

class TwoWay:
    def __init__(self):
       self.d = {}
    def add(self, k, v):
       self.d[k] = v
       self.d[v] = k
    def remove(self, k):
       self.d.pop(self.d.pop(k))
    def get(self, k):
       return self.d[k]

2009-09-03 16:46:21

Answer 8

A:

Here is my own solution to this problem: http://github.com/spenthil/pymathmap/blob/master/pymathmap.py

The goal is to make it as transparent to the user as possible. The only introduced significant attribute is partner.

OneToOneDict subclasses from dict - I know that isn't generally recommended, but I think I have the common use cases covered. The backend is pretty simple, it (dict1) keeps a weakref to a 'partner' OneToOneDict (dict2) which is its inverse. When dict1 is modified dict2 is updated accordingly as well and vice versa.

From the docstring:

>>> dict1 = OneToOneDict()
>>> dict2 = OneToOneDict()
>>> dict1.partner = dict2
>>> assert(dict1 is dict2.partner)
>>> assert(dict2 is dict1.partner)
>>> dict1['one'] = '1'
>>> dict2['2'] = '1'
>>> dict1['one'] = 'wow'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1['one'] = '1'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1.update({'three': '3', 'four': '4'})
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict3 = OneToOneDict({'4':'four'})
>>> assert(dict3.partner is None)
>>> assert(dict3 == {'4':'four'})
>>> dict1.partner = dict3
>>> assert(dict1.partner is not dict2)
>>> assert(dict2.partner is None)
>>> assert(dict1.partner is dict3)
>>> assert(dict3.partner is dict1)
>>> dict1.setdefault('five', '5')
>>> dict1['five']
'5'
>>> dict1.setdefault('five', '0')
>>> dict1['five']
'5'

When I get some free time, I intend to make a version that doesn't store things twice. No clue when that'll be though :)

spenthil 2010-10-05 07:30:31

ansaurus

tags:

views:

answers:

A data-structure for 1:1 mappings in python?

related questions