views:

44

answers:

3

I'm trying to parse HTML in the browser. The browser receives 2 HTML files as strings, eg. HTML1 and HTML2.

I now need to parse these "documents" just as one would parse the current document. This is why I was wondering if it is possible to create custom documents based on these HTML strings (these strings are provided by the server or user).

So that for example the following would be valid: $(html1Document).$("#someDivID")...

If anything is unclear, please ask me to clarify more.

Thanks.

+1  A: 

You can always append your html to some hidden div (though innerHTML or jQuery .html(..)). It won't be treated exactly as a new document, but still will be able to search its contents.

It has a few side-effects, though. For example, if your html defines any script tags, they'll be loaded. Also, browser may (and probably will) remove html, body and similar tags.

edit
If you specifically need title and similar tags, you may try iframe loading content from your server.

Nikita Rybak
How would this work with non-body elements such as titles and javascript? Would this not cause problems/conflicts?
Tom
@Tom I updated my answer about it. In particular, script tags are annoying: if you scrapped content from another server, they'll point to yours now.
Nikita Rybak
+3  A: 
var $docFragment = $(htmlString);

$docFragment.find("a"); // all anchors in the HMTL string

Note that this ignores any document structure tags (<html>, <head> and <body>), but any contained tags will be available.

Tomalak
Probably what I need. However, how would this handle script, html, head tags etc.? Would it not modify these?
Tom
@Define: "handle". (script tags will returned but not evaluated/executed)
Tomalak
Basically the side effects mentioned by Nikita Rybak. But I guess these won't apply here because I'm merely searching a string, not appending it to the document.
Tom
@Tom: Yes. You could, though, append a returned script tag to your document and have it executed, but it will execute in your document's context of course. As long as you don't append it, it is just *there* as a detached DOM node.
Tomalak
I see, thanks a lot.
Tom
I wonder why I didn't think of it when solving similar problem? Should try it, thanks.
Nikita Rybak
+1  A: 

With jQuery you can do this:

$(your_document_string).someParsingMethod().another();
Epeli