views:

99

answers:

1

i am trying to fetch the urls using google app engines urlFetch service and implement a proxy site.sites like twitter and and facebook appear disfigured as if they are missing the stylesheet ,even google is missing the google logo but yahoo opens all fine i can't understand why.

+1  A: 

When you use urlfetch, it fetches the HTML of the page, and none of the images, CSS, JavaScript, or any other resources.

Yahoo looks fine presumably because they specify their images and CSS using absolute URLS (e.g., http://www.yahoo.com/image.png), so when your urlfetch'd page displays, it includes full image URLs from yahoo.com. Keep in mind, when someone doesn't have access to yahoo.com, those images won't appear on your proxied page either.

edit: It looks like Yahoo inlines their CSS into the HTML page itself, which would explain why it works in your fetched copy.

Google appears without CSS/images because their CSS/images are specified as relative URLs (e.g., /image.png), and your proxy doesn't have an image at /image.png

You'll have to parse the urlfetch'ed page content to find images and CSS that need to be fetched and proxied as well. Just be sure to handle relative URLs like /resource.png as well as absolute URLs like www.foo.com/resource.png.

Jason Hall
yeah,i figured that out later.I guess i can fetch the css and place it inline where it is not but How do i go about images ?
Bunny Rabbit
When you find an image in the page, fetch the image too and store it in your proxy. Then rewrite the page's <img> property to point to your proxied image instead of the original. For simplicity I'd do this with CSS too, and any other resources.
Jason Hall
and i guess the links of the images being displayed using css will have to be altered too,thats a heck of a work!
Bunny Rabbit
You can actually fix this by adding a base href tag to the markup: http://www.w3schools.com/TAGS/tag_base.asp . More importantly, though, _why_ are you writing a proxy app?
Nick Johnson