views:

27

answers:

2

I'm developing a chromium extension so I have cross-host permissions for XMLHttpRequests for the domains I'm asking permissions for.

I have used XMLHttpRequest and got an HTML webpage (txt/html). I want to use XPath (document.evaluate) to extract relevant bits from it. Unfortunatly I'm failing to construct a DOM object from the returned string of the html.

var xhr = new XMLHttpRequest();
var name = escape("Sticks N Stones Cap");
xhr.open("GET", "http://items.jellyneo.net/?go=show_items&name="+name+"&name_type=exact", true);
xhr.onreadystatechange = function () {
    if (xhr.readyState == 4) {
    var parser = new DOMParser();
    var xmlDoc = parser.parseFromString(xhr.responseText,"text/xml");
    console.log(xmlDoc);
    }
}

xhr.send();

console.log is to display debug stuff in Chromium JS console.

In the said JS console. I get this:

Document
<html>​
<body>​
<parsererror style=​"display:​ block;​ white-space:​ pre;​ border:​ 2px solid #c77;​ padding:​ 0 1em 0 1em;​ margin:​ 1em;​ background-color:​ #fdd;​ color:​ black">​
<h3>​This page contains the following errors:​</h3>​
<div style=​"font-family:​monospace;​font-size:​12px">​error on line 1 at column 60: Space required after the Public Identifier
​</div>​
<h3>​Below is a rendering of the page up to the first error.​</h3>​
</parsererror>​
</body>​
</html>​

So how am I suppose to use XMLHttpRequest -> receive HTML -> convert to DOM -> use XPath to transverse?

Should I be using the "hidden" iframe hack for loading / receiving DOM object?

A: 

If you are trying to access content from an external domain, I don't think the browsers will let you do that - not even with an iframe. This is due to policies and concerns about security. There is a special header that the external website can send to allow you to use the content, unfortunately I cannot remember the name.

Ed.C
If he gets the content by XHR it means he's on the same domain.
Mic
This javascript is part of a chromium extensions, which allows to request cross-origin XHR for requested domains ;-) see http://code.google.com/chrome/extensions/xhr.html
Dima
A: 

The DOMParser is choking on the DOCTYPE definition. It would also error on any other non-xhtml markup such as a <link> without a closing /. Do you have control over the document being sent? If not, your best bet is to parse it as a string. Use regular expressions to find what you are looking for.

Edit: You can get the browser to parse the contents of the body for you by injecting it into a hidden div:

var hidden = document.body.appendChild(document.createElement("div"));
hidden.style.display = "none";
hidden.innerHTML = /<body[^>]*>([\s\S]+)<\/body>/i(xhr.responseText)[1];

Now search inside hidden to find what you're looking for:

var myEl = hidden.querySelector("table.foo > tr > td.bar > span.fu");
var myVal = myEl.innerHTML;
gilly3
No, I don't have control over the document being sent. And I'm a bit confused. For the same page I can get `document` object, yet I can't get it if I have it passed to me as a string?
Dima
Until it is parsed by the browser, it is just a string. To get the browser to parse it, inject the html into a hidden div on the page, then search the div for whatever you are looking for.
gilly3