ansaurus

Question

How to set initial size for a dictionary in Python?

Answer 1

+3 A:

You can try to separate key hashing from the content filling with dict.fromkeys classmethod. It'll create a dict of a known size with all values defaulting to either None or a value of your choice. After that you could iterate over it to fill with the values. It'll help you to time the actual hashing of all keys. Not sure if you'd be able significantly increase the speed though.

SilentGhost 2009-08-19 09:14:37

Answer 2

+2 A:

If your datas need/can be stored on disc perhaps you can store your datas in a BSDDB database or use Cpickle to load/store your dictionnary

chub 2009-08-19 09:17:32

Answer 3

+1 A:

If you know C, you can take a look at dictobject.c and the Notes on Optimizing Dictionaries. There you'll notice the parameter PyDict_MINSIZE:

PyDict_MINSIZE. Currently set to 8.

This parameter is defined in dictobject.h. So you could change it when compiling Python but this probably is a bad idea.

2009-08-19 09:22:38

Answer 4

+5 A:

I tried :

a = dict.fromkeys((range(4000000)))

It creates a dictionary with 4 000 000 entries in about 3 seconds. After that, setting values are really fast. So I guess dict.fromkey is definitly the way to go.

e-satis 2009-08-19 09:24:25

+1 for mentioning dict.fromkeys(). Howevery, using range() to specify keys means that you end up with a dict of sequential keys. If that's what is required, why not just use a list? a = [None]*4000000

lsc 2009-08-19 09:53:23

That was not direct solution, just a demonstration that you could use fromkeys to pre-generate the dict in a very sort time.

e-satis 2009-08-19 11:47:28

Answer 5

+8 A:

With performance issues it's always best to measure. Here are some timings:

 d = {}
 for i in xrange(4000000):
     d[i] = None
 # 722ms

 d = dict(itertools.izip(xrange(4000000), itertools.repeat(None)))
 # 634ms

 dict.fromkeys(xrange(4000000))
 # 558ms

 s = set(xrange(4000000))
 dict.fromkeys(s)
 # Not including set construction 353ms

The last option doesn't do any resizing, it just copies the hashes from the set and increments references. As you can see, the resizing isn't taking a lot of time. It's probably your object creation that is slow.

Ants Aasma 2009-08-19 10:05:42

It does not matter how I initialize the dictionary, filling it with data always takes a lot of time. Looks like indeed all time is spend on object creation. Thanks!

tkokoszka 2009-08-19 10:32:21

Answer 6

A:

Do you initialize all keys with new "empty" instances of the same type? Is it not possible to write a defaultdict or something that will create the object when it is accessed?

kaizer.se 2009-08-19 21:56:33

ansaurus

tags:

views:

answers:

How to set initial size for a dictionary in Python?

related questions