ansaurus

Question

Answer 1

+6 A:

Try using the codecs package:

import codecs
buildLog = codecs.open(sys.argv[1], "r", "utf-16").readlines()

Also you may run into trouble with your print statement as it may try to convert the strings to your console encoding. If you're printing for your review you could use,

print repr(line)

Alexander Ljungberg 2009-08-05 20:42:51

Thanks, this is exactly what I needed. Say I run across a UTF-8 or an ASCII file, will this break?

Andrew Keeton 2009-08-06 14:21:29

@Andrew Keeton: Of course will break if you don't change the encoding from "utf-16" to "utf-8" or "ascii" (or "cp1252") as appropriate. See `http://www.amk.ca/python/howto/unicode` and `http://www.joelonsoftware.com/articles/Unicode.html`

John Machin 2009-08-06 15:54:53

Answer 2

A:

Tried this? When saving a parsing script with non-ascii characters, I had the interpreter suggest an alternate encoding to the front of the file.

Non-ASCII found, yet no encoding declared.  Add a line like:
# -*- coding: cp1252 -*-

Adding that as the first line of the script fixed the problem for me. Not sure if this is what's causing your error, though.

Sean O'Hollaren 2009-08-05 20:55:11

That's certainly not causing his error. Yours is a COMPILE-time problem -- a source file encoded in cp1252 is not so declared. His is a RUN-time problem caused by trying to read a utf16-encoded file as though it were ascii.

John Machin 2009-08-06 00:05:41

ansaurus

tags:

views:

answers:

Searching a Unicode file using Python

Setup

Question

related questions