views:

2801

answers:

7

I have about 10million values that I need to put in some type of look up table, so I was wondering which would be more efficient a list or dict?

I know you can do something like this for both:

if something in dict_of_stuff:
    pass

and

if something in list_of_stuff:
    pass

My thought is the dict will be faster and more efficient.

Thanks for your help.

EDIT 1
Little more info on what I'm trying to do. Euler Problem 92. I'm making a look up table to see if a value calculated has all ready been calculated.

EDIT 2
Efficiency for look up.

EDIT 3
There are no values assosiated with the value...so would a set be better?

+3  A: 

if data are unique set() will be the most efficient, but of two - dict (which also requires uniqueness, oops :)

SilentGhost
not really... data must be unique for the dict too.
nosklo
i've realised when i saw my answer posted %)
SilentGhost
+8  A: 

A dict is a hash table, so it is really fast to find the keys. So between dict and list, dict would be faster. But if you don't have a value to associate, it is even better to use a set. It is a hash table, without the "table" part.


EDIT: for your new question, YES, a set would be better. Just create 2 sets, one for sequences ended in 1 and other for the sequences ended in 89. I have sucessfully solved this problem using sets.

nosklo
+1  A: 

You want a dict.

For (unsorted) lists in Python, the "in" operation requires O(n) time---not good when you have a large amount of data. A dict, on the other hand, is a hash table, so you can expect O(1) lookup time.

As others have noted, you might choose a set (a special type of dict) instead, if you only have keys rather than key/value pairs.

Related:

  • Python wiki: information on the time complexity of Python container operations.
  • SO: Python container operation time and memory complexities
zweiterlinde
Even for sorted lists, "in" is O(n).
Roger Pate
For a linked list, yes---but "lists" in Python are what most people would call vectors, which provide indexed access in O(1) and a find operation in O(log n), when sorted.
zweiterlinde
+16  A: 

Speed

Lookups in lists are O(n), lookups in dictionaries are amortized O(1), with regard to the number of items in the data structure. If you don't need to associate values, use sets.

Memory

Both dictionaries and sets use hashing and they use much more memory than only for object storage. According to A.M. Kuchling in Beautiful Code, the implementation tries to keep the hash 2/3 full, so you might waste quite some memory.

If you do not add new entries on the fly (which you do, based on your updated question), it might be worthwhile to sort the list and use binary search. This is O(log n), and is likely to be slower for strings, impossible for objects which do not have a natural ordering.

Torsten Marek
list sorting is O(n log n)
SilentGhost
Yes, but it's a one-off operation if the contents never change. Binary search is O(log n).
Torsten Marek
OTOH, 10 million integers will take, what, 40 million bytes? If the hash is 2/3 full, that goes to 60 million, and there'll be overhead (anyone know how much?), but the whole thing should fit in a few hundred megs of memory. It might've been a problem ten years ago, but it's not really now.
John Fouhy
@John Fouhy: the ints are not stored in the hash table, only pointers, i.e. hou have 40M for the ints (well, not really when a lot of them are small) and 60M for the hash table. I agree that it's not that much of a problem nowadays, still it's worthwhile to keep in mind.
Torsten Marek
A: 

I am not an expert in python, but: If you need the key-value structure, the dict is a good option.

If you looks for efficiency with this ammounts of data, you should try to develop your own mechanism to managing this data in your context.

Sorry if this response have any grammatical mistake, I'm learning english.

Matias
A: 

You don't actually need to store 10 million values in the table, so it's not a big deal either way.

Hint: think about how large your result can be after the first sum of squares operation. The largest possible result will be much smaller than 10 million...

Kiv
A: 

set() is exactly what you want. O(1) lookups, and smaller than a dict.

recursive