the following function parses a CSV file into a list of dictionaries, where each element in the list is a dictionary where the values are indexed by the header of the file (assumed to be the first line.)
this function is very very slow, taking ~6 seconds for a file that's relatively small (less than 30,000 lines.)
how can I speed it up?
def csv2dictlist_raw(filename, delimiter='\t'):
f = open(filename)
header_line = f.readline().strip()
header_fields = header_line.split(delimiter)
dictlist = []
# convert data to list of dictionaries
for line in f:
values = map(tryEval, line.strip().split(delimiter))
dictline = dict(zip(header_fields, values))
dictlist.append(dictline)
return (dictlist, header_fields)
in response to comments:
I know there's a csv module and I can use it like this:
data = csv.DictReader(my_csvfile, delimiter=delimiter)
this is much faster. However, the problem is that it doesn't automatically cast things that are obviously floats and integers to be numeric and instead makes them strings. How can I fix this?
Using the "Sniffer" class does not work for me. When I try it on my files, I get the error:
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/csv.py", line 180, in sniff
raise Error, "Could not determine delimiter"
Error: Could not determine delimiter
How can I make DictReader parse the fields into their types when it's obvious?
thanks.
thanks.