views:

172

answers:

2
class sss(webapp.RequestHandler):
  def get(self):
    url = "http://www.google.com/"
    result = urlfetch.fetch(url)    
    if result.status_code == 200:
        self.response.out.write(result.content)

and this view show :

alt text

when i change code to this:

if result.status_code == 200:
        self.response.out.write(result.content.decode('utf-8').encode('gb2312'))

it show :

alt text

so ,what i should do ?

thanks

updated

when i use this :

self.response.out.write(result.content.decode('big5'))

the page is :

alt text

it is different with i saw google.com

alt text

how to get google.com that i saw ?

thanks

+2  A: 

Google is probably serving you ISO-8859-1. At least, that is what they serve me for the User-Agent "AppEngine-Google; (+http://code.google.com/appengine)" (which urlfetch uses). The Content-Type header value is:

text/html; charset=ISO-8859-1

So you would use:

result.content.decode('ISO-8859-1')

If you check result.headers["Content-Type"], your code can adapt to changes on the other end. You can generally pass the charset (ISO-8859-1 in this case) directly to the Python decode method.

Matthew Flaschen
A: 

how to get google.com that i saw ?

It's probably using relative URLs to images, javascript, CSS, etc, that you're not changing into absolute URLs into google's site. To confirm this: your logs should be showing 404 errors ("page not found") as the browser to which you're serving "just the HTML" tries locating the relative-addressed resources that you're not supplying.

Alex Martelli