views:

102

answers:

1

I am trying to load the source of any page into a textbox for a client side only html editor. I need to be able to get the entire source of a web page, not just the body. This yql query returns just the body:

http://query.yahooapis.com/v1/public/yql?format=xml&callback=editor.handleLoad&q=select+*+from+html+where+url%3D%22example.com%22

Is there any way to get the entire source, or are there any other free json-p-x webservices that can?

+1  A: 

I don't see an obvious way to do that with YQL, but here is a Yahoo Pipe that seems to work. It refuses to get sites that are disallowed by their robots.txt, but it is getting the entire source for other sites:

http://pipes.yahoo.com/pipes/pipe.info?_id=dCsGDO123hG6BNv70EypaA

The default is set to www.example.com, which is denied because of the robots.txt on that page. However, it accepts the URL as a parameter. Here's a link to an example usage of this pipe that gets the source of pipes.yahoo.com and returns the result wrapped in JSON:

http://pipes.yahoo.com/pipes/pipe.run?_id=dCsGDO123hG6BNv70EypaA&_render=json&url=http%3A%2F%2Fpipes.yahoo.com%2F

Does this help?

Chris Nielsen
That is closer. The pipe appears to filter out all meta and script tags. Are there any proxies/webservices that will return the entire page?
Craig