views:

648

answers:

1

I have a text file that is encoded in UTF-8. I'm reading it in to analyze and plot some data. I would like the file to be read in as ascii. Would it be best to use the codecs module or use the builtin string decode method? Also, the file is divided up as a csv, so could the csv module also be a valid solution?

Thanks for your help.

+5  A: 

Do you mean that your file is encoded in UTF-8? ("Unicode" is not an encoding... Required reading: http://www.joelonsoftware.com/articles/Unicode.html) I'm not 100% sure but I think you should be able to read a UTF-8 encoded file with the csv module, and you can convert the strings which contain special characters to Python's unicode strings (edit: if you need to) after reading.

There are a few examples of using csv with UTF-8 encoded data at http://docs.python.org./library/csv.html#csv-examples; it might help you to look at them.

David Zaslavsky
thanks for your help and I did mean UTF-8....i changed it in the question. thanks again
Casey