An AJAX response is returning the full HTML page. I need to extract the fragment between the body (<body>
and </body>
) tags. This is required to be done on the client side using JavaScript. Any help will be appreciated.
views:
84answers:
3If your HTML Page is on Web then You can use YQL.
e.g if your page url is http://xyz.com/page.html and you want everything in body element do like this
select * from html where url="http://xyz.com/page.html" and xpath='//body'
If you are new to YQL read this http://en.wikipedia.org/wiki/YQL_Page_Scraping
There is also simple way to do it using Chromyqlip Extension https://chrome.google.com/extensions/detail/bkmllkjbfbeephbldeflbnpclgfbjfmn
Hope this will help You!!!
The simplest but kind-of worst way would be simple string hacking on the response text.
var bodyhtml= html.split('<body>').pop().split('</body>')[0];
This is unsatisfactory in the general case, but can be feasible if you know the exact format of the HTML being returned (eg. that there's no attributes on the <body>
, that the sequences <body>
and </body>
isn't used in a comment in the middle of the page, etc).
Another still-quite-bad way is to write the whole document to the innerHTML
of a newly-created <div>
and fish out the elements you want, not caring that writing <html>
or <body>
inside a <div>
is broken. You'll be unable to reliably separate the child elements of <head>
from those in <body>
this way, but this is what eg jQuery does.
A more robust but more painful way would be to use a separate HTML document:
var iframe= document.createElement('iframe');
iframe.style.display= 'none';
document.body.insertBefore(iframe, document.body.firstChild);
var idoc= 'contentDocument' in iframe? iframe.contentDocument : iframe.contentWindow.document;
idoc.write(htmlpage);
idoc.close();
alert(idoc.body.innerHTML);
document.body.removeChild(iframe);
though this would also execute all scripts inside the document, potentially changing it, so that might not be satisfactory either.
Thanks to all who answered. For now, we took the substring approach since we know the exact format of the HTML being returned (we are generating it ourselves). I shall lookup YQL as a general technology for this kind of requirements.