ansaurus

Question

Python: List vs Dict for look up table

Answer 1

+3 A:

if data are unique set() will be the most efficient, but of two - dict (which also requires uniqueness, oops :)

SilentGhost 2009-02-04 23:30:48

not really... data must be unique for the dict too.

nosklo 2009-02-04 23:33:29

i've realised when i saw my answer posted %)

SilentGhost 2009-02-04 23:34:50

Answer 2

+8 A:

A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.

EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.

nosklo 2009-02-04 23:31:41

Answer 3

+1 A:

You want a dict.

For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.

As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.

Python wiki: information on the time complexity of Python container operations.
SO: Python container operation time and memory complexities

zweiterlinde 2009-02-04 23:37:45

Even for sorted lists, "in" is O(n).

Roger Pate 2009-02-05 03:39:10

For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.

zweiterlinde 2009-02-05 14:49:41

Answer 4

+16 A:

Speed

Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.

Memory

Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.

If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.

Torsten Marek 2009-02-04 23:38:36

list sorting is O(n log n)

SilentGhost 2009-02-04 23:47:11

Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).

Torsten Marek 2009-02-04 23:54:42

OTOH, 10 million integers will take, what, 40 million bytes? If the hash is 2/3 full, that goes to 60 million, and there'll be overhead (anyone know how much?), but the whole thing should fit in a few hundred megs of memory. It might've been a problem ten years ago, but it's not really now.

John Fouhy 2009-02-05 01:26:09

@John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.

Torsten Marek 2009-02-05 11:04:51

Answer 5

A:

I am not an expert in python, but: If you need the key-value structure, the dict is a good option.

If you looks for efficiency with this ammounts of data, you should try to develop your own mechanism to managing this data in your context.

Sorry if this response have any grammatical mistake, I'm learning english.

Matias 2009-02-04 23:39:36

Answer 6

A:

You don't actually need to store 10 million values in the table, so it's not a big deal either way.

Hint: think about how large your result can be after the first sum of squares operation. The largest possible result will be much smaller than 10 million...

Kiv 2009-02-23 19:03:30

Answer 7

A:

set() is exactly what you want. O(1) lookups, and smaller than a dict.

recursive 2009-02-23 19:24:39

ansaurus

tags:

views:

answers:

Python: List vs Dict for look up table

Speed

Memory

related questions