views:

49

answers:

3

Hi, I have a table in MySql DB which I want to load it to a dictionary in python. the table columns is as follows:

id,url,tag,tagCount

tagCount is the number of times that a tag has been repeated for a certain url. So in that case I need a nested dictionary, in other words a dictionary of dictionary, to load this table. Because each url have several tags for which there are different tagCounts.the code that I used is this:( the whole table is about 22,000 records )

cursor.execute( ''' SELECT url,tag,tagCount
                    FROM wtp ''')

urlTagCount = cursor.fetchall()

d = defaultdict(defaultdict)

for url,tag,tagCount in urlTagCount:
    d[url][tag]=tagCount

print d

first of all I want to know if this is correct.. and if it is why it takes so much time? Is there any faster solutions? I am loading this table into memory to have fast access to get rid of the hassle of slow database operations, but with this slow speed it has become a bottleneck itself, it is even much slower than DB access. and anyone help? thanks

+1  A: 

maybe you could try with normal dicts and tuple keys like

d = dict()

for url,tag,tagCount in urlTagCount:
    d[(url, tag)] = tagCount

in any case did you try:

d = defaultdict(dict)

instead of

d = defaultdict(defaultdict)
joaquin
A: 

I could manage to verify the code, and it is working perfectly. For those amateurs like me, i suggest never try to "print" a very large nested dictionary. that "print d" in the last line of the code was the problem for it being slow. If remove it or try to access the dictionary with actual keys, then it is very fast.

Hossein
+1  A: 

You need to ensure that the dictionary (and each of the nested dictionaries) exist before you assign a key, value to them. It is helpful to use setdefault for this purpose. You end up with something like this:

d = {}
for url, tag, tagCount in urlTagCount:
    d.setdefault(url, {})[tag] = tagCount
Pierce
I typically use python 2.4 or even python 2.3, so the defaultdict was new to me. What I gave will work, too, but the version given in the question seems clearer to me.
Pierce