views:

269

answers:

1

I'm using google app engine to build a website and I'm having problems with special characters. I think I've reduced the problem to this two code samples:

request = urlfetch.fetch(
        url=self.WWW_INFO, 
    payload=urllib.urlencode(inputs), 
     method=urlfetch.POST, 
    headers={'Content-Type': 'application/x-www-form-urlencoded'})
print request.content

The previous code displays the content just fine, showing the special characters. But, the correct way to use the framework to display something is using:

request = urlfetch.fetch(
        url=self.WWW_INFO, 
    payload=urllib.urlencode(inputs), 
     method=urlfetch.POST, 
    headers={'Content-Type': 'application/x-www-form-urlencoded'})
self.response.out.write(request.content)

Which doesn't display the special characters, and instead just prints �. What should I do so it displays correctly?

I know I'm missing something, but I can't seem to grasp what it is. The website sets the <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">, and I've tried with charset=UTF-8 with no success.

I'll appreciate any advice that can point me in the right direction.

+1  A: 

You need to get the charset from the content-type header in the fetch's result, use it to decode the bytes into Unicode, then, on the response, set the header with your favorite encoding (I do suggest utf-8 -- no good reason to do otherwise) and emit the encoding of the Unicode text via that codec. The pass through unicode is not strictly needed (when you're doing nothing at all with the contents, just bouncing it right back to the response, you might use identical content-type and charset to what you received) but it's recommended on general grounds (use encoded byte strings only on input/output, always keep all text "within" your app as unicode).

IOW, your problem seems to be mostly that you're not setting headers correctly on the response.

Alex Martelli
I've checked with firebug, and it seems like app engine automatically sets the content-type header to utf-8.
Eduardo Grajeda
That is the default, yes - but you can set it to something else if you wish.
Nick Johnson
Fixed it. What I was receiving was a ISO-8859-1 encoded string, so I had to do unicode(request.content, 'iso-8859-1') to correctly convert it to unicode before printing it.
Eduardo Grajeda
@Eduardo, glad you solved it - my suggestion, specifically `request.content.decode('iso-8859-1')`, is quite equivalent to the `unicode` call you're now performing.
Alex Martelli