views:

84

answers:

3

An AJAX response is returning the full HTML page. I need to extract the fragment between the body (<body> and </body>) tags. This is required to be done on the client side using JavaScript. Any help will be appreciated.

A: 

If your HTML Page is on Web then You can use YQL.

e.g if your page url is http://xyz.com/page.html and you want everything in body element do like this

select * from html where url="http://xyz.com/page.html" and xpath='//body'

If you are new to YQL read this http://en.wikipedia.org/wiki/YQL_Page_Scraping

There is also simple way to do it using Chromyqlip Extension https://chrome.google.com/extensions/detail/bkmllkjbfbeephbldeflbnpclgfbjfmn

Hope this will help You!!!

Markandey Singh
+1  A: 

The simplest but kind-of worst way would be simple string hacking on the response text.

var bodyhtml= html.split('<body>').pop().split('</body>')[0];

This is unsatisfactory in the general case, but can be feasible if you know the exact format of the HTML being returned (eg. that there's no attributes on the <body>, that the sequences <body> and </body> isn't used in a comment in the middle of the page, etc).

Another still-quite-bad way is to write the whole document to the innerHTML of a newly-created <div> and fish out the elements you want, not caring that writing <html> or <body> inside a <div> is broken. You'll be unable to reliably separate the child elements of <head> from those in <body> this way, but this is what eg jQuery does.

A more robust but more painful way would be to use a separate HTML document:

var iframe= document.createElement('iframe');
iframe.style.display= 'none';
document.body.insertBefore(iframe, document.body.firstChild);
var idoc= 'contentDocument' in iframe? iframe.contentDocument : iframe.contentWindow.document;
idoc.write(htmlpage);
idoc.close();
alert(idoc.body.innerHTML);
document.body.removeChild(iframe);

though this would also execute all scripts inside the document, potentially changing it, so that might not be satisfactory either.

bobince
A: 

Thanks to all who answered. For now, we took the substring approach since we know the exact format of the HTML being returned (we are generating it ourselves). I shall lookup YQL as a general technology for this kind of requirements.

Diptendu Dutta