I'm putting around 4 millions different keys into a python dictionary. Creating this dictionary takes about 15 minutes and consumes about 4GB memory on my machine. After dictionary is fully created, queering the dictionary is fast.
I suspect that dictionary creation is so resource consuming as the dictionary is very often rehashed (as it grows enormously). Is is possible to create a dictionary in Python with some initial size or bucket number?
My dictionary points from a number to an object.
class MyObject(object):
def __init__(self):
# some fields...
d = {}
d[i] = MyObject() # 4M times on different key...