ansaurus

Question

How to efficiently output dictionary as csv file using Python's csv module? Out of memory error.

Answer 1

+1 A:

You could be tripping over an internal Python issue. I'd report it at bugs.python.org.

owenmarshall 2010-07-14 20:36:22

Answer 2

+1 A:

DictWriter.writerows(...) takes all the dicts you pass in to it and creates (in memory) an entire new list of lists, one for each row. So if you have a lot of data, I can see how a MemoryError would pop up. Two ways you might proceed:

Iterate over the list yourself and call DictWriter.writerow once for each one. Although this will mean a lot of writes.
Batch up rows in to smaller lists and call DictWriter.writerows for them. Less IO, but you avoid the huge chunk of memory getting allocated.

Corey Porter 2010-07-14 20:46:49

Using:for row in dictrows: data.writerow(row)does not make a difference. I don't understand why the memory is an issue -- it's only 13,000 dictionaries, and each one is still quite small and is not nested at all. It only contains string and numbers... is there an alternative to the csv module that is less slow?

2010-07-14 20:50:06

Answer 3

A:

I don't have an answer to what is happening with csv, but I found that the following substitute serializes the dictionary to a file in less than a few seconds:

for row in dictrows:
    out_f.write("%s%s" %(delimiter.join([row[name] for name in fieldnames]),
                         lineterminator))

where dictrows is a generator of dictionaries produced by DictReader from csv, fieldnames is a list of fields.

Any idea on why csv doesn't perform similarly would be greatly appreciated. thanks.

2010-07-14 21:07:29

Answer 4

A:

You say that if you loop over data.writerow(single_dict) that it still gets the problem. Put in code to show the row count every 100 rows. How many dicts has it processed before it gets the Memory error? Run more or fewer processes to soak up more or less memory ... does the place where it fails vary?

What is max(len(d) for d in dictrows) ? How long are the strings in the dicts?

How much free memory do you have anyway?

Update: See if Dictwriter is the problem; eliminate it and use basic csv functionality:

writer = csv.writer(.....)
for d in dictrows:
   row = [d[fieldname] for fieldname in fieldnames]
   writer.writerow(row)

John Machin 2010-07-14 21:48:53

ansaurus

tags:

views:

answers:

How to efficiently output dictionary as csv file using Python's csv module? Out of memory error.

related questions