views:

128

answers:

5

I have a javascript variable containing the HTML source code of a page (not the source of the current page), I need to extract all links from this variable. Any clues as to what's the best way of doing this?

Is it possible to create a DOM for the HTML in the variable and then walk that?

+2  A: 

If you're using jQuery, you can really easily I believe:

var doc = $(rawHTML);
var links = $('a', doc);

http://docs.jquery.com/Core/jQuery#htmlownerDocument

brianreavis
A: 

If you're running Firefox YES YOU CAN ! It's called DOMParser , check it out:

DOMParser is mainly useful for applications and extensions based on Mozilla platform. While it's available to web pages, it's not part of any standard and level of support in other browsers is unknown.
xxxxxxx
"YES YOU CAN!" What is this? Bob the Builder? :P
brianreavis
:) Yes , it's bob the builder
xxxxxxx
A: 

This is a dup of another SO question.

Chris
+1  A: 

I don't know if this is the recommended way, but it works: (JavaScript only)

var rawHTML = '<html><body><a href="foo">bar</a><a href="narf">zort</a></body></html>';

var doc = document.createElement("html");
doc.innerHTML = rawHTML;
var links = doc.getElementsByTagName("a")
var urls = [];

for (var i=0; i<links.length; i++) {
    urls.push(links[i].getAttribute("href"));
}
alert(urls)
andre-r
A: 

This is useful esepcially if you need to replace links...

var linkReg = /(<[Aa]\s(.*)<\/[Aa]>)/g;

var linksInText = text.match(linkReg);
Wosis