ansaurus

Question

Answer 1

A:

Every piece of textual data is encoded. It's hard to tell what the problem is without any code, so the only advice I can give now is: Try decoding the response before parsing it ...

resp = do_request()
## look on the nyt site if they mention the encoding used and use it instead.
decoded = resp.decode('utf-8')
parsed = parse( decoded )

THC4k 2010-02-21 21:45:47

Answer 2

A:

It appears to be trying to decode '\/' which is used whenever a slash is used. This can be avoided by making using the string function.

str('http:\/\/www.nytimes.com\/2010\/02\/17\/business\/global\/17barclays.html')
'http:\\/\\/www.nytimes.com\\/2010\\/02\\/17\\/business\\/global\\/17barclays.html'

from there you can use replace.

str('http:\/\/www.nytimes.com\/2010\/02\/17\/business\/global\/17barclays.html').replace('\\', "")

what 2010-02-22 03:04:49

Answer 3

+1 A:

You need to check your HTTP headers to see what char encoding they are using when returning the results. My bet is that everything is encoded as utf-8 and when you try to write to CSV, you are implicitly encoding output as ascii.

The ' they are using is not in the ascii char set. You can catch the UnicodeError exception.

Follow the golden rules of encodings.

Decode early into unicode (data.decode('utf-8', 'ignore'))
Use unicode internally.
Encode late - during output - data.encode('ascii', 'ignore'))

You can probably set your CSV writer to use utf-8 encodings when writing.

Note: You should really see what encoding they are giving you before blindly using utf-8 for everything.

rox0r 2010-03-02 18:22:55

ansaurus

tags:

views:

answers:

Python JSON New York Times API

related questions