tags:

views:

38

answers:

1

I'm retrieving an entire HTML document via AJAX - and that works fine. But I need to extract certain parts of that document and do things with them.

Using a framework (jquery, mootools, etc) is not an option.

The only solution I can think of is to grab the body of the HTML document with a regex (yes, I know, terrible) ie. <body>(.*)</body> put that into the current page's DOM in a hidden element, and work with it from there.

Is there an easier/better way?

Update

I've done some testing, and inserting an entire HTML document into a created element behaves a bit differently across browsers I've tested. For example:

  • FF3.5: keeps the contents of the HEAD and BODY tags
  • IE7 / Safari4: Only includes what's between ...
  • Opera 10.10: Keeps HEAD and everything inside it, Keeps contents of BODY

The behavior of IE7 and Safari are ideal, but different browsers are doing this differently. Since I'm loading a predetermined HTML document I think I'm going to use the regEx to grab what I want and insert it into a DOM element - unless someone has other suggestions.

+5  A: 

Elements can exist without being in the page itself. Just dump the HTML into a dummy div.

var wrapper = document.createElement('div');
wrapper.innerHTML = "<ul><li>foo</li><li>bar</li></ul>";
wrapper.getElementsByTagName('li').length; // 2

Given your edits, we run into a sticky situation, since you want getElementById. The matter would probably be easy if you could just create a new virtual document via document.implementation.createDocument, but IE doesn't support that at all.

Using a regex is a messy business, since what if we see something like <body><input value="</body>" /></body>? You could probably just make your regex greedy so that it moves on to the last instance of </body>, but if you do end up running into troubles, a more thorough parsing may be necessary. Even if a full framework isn't an option, you might end up wanting to use something like Sizzle, the core of libraries like jQuery, to look for the element you want. Or, if you're really feeling in a purist sort of mood, you could write the recursive search function yourself - but why take that hit if someone else has already taken it?

var response_el = document.createElement('html'), foo;
response_el.innerHTML = the_html_elements_content;
foo = Sizzle('#foo', response_el);
Matchu
I've tried this; it behaves rather oddly because I'm inserting an entire HTML document ... `<html> ... </html>`
Stomped
@Stomped - if you *know* you'll be inserting a full HTML document, could you just create a `<html>` element, rip off the first 6 and last 7 characters, and set *that* as the `innerHTML`?
Matchu
@Stomped - curious, I went to the jQuery source to look up how jQuery creates an element from `$('<html>whatever</html>')` to see if they had any shortcuts - turns out they just use a greedy regex to make sure it looks like `<tag>stuff</tag>`, and create a `<tag>` element and put `stuff` inside. You could use therefore either use their more generic approach, or just count on receiving a full HTML document and see above comment :)
Matchu
Matchu: I just did the same thing (looked at jQuery source) - wish I had thought of doing so earlier. I also discovered that different browsers do things oddly when you insert an entire HTML doc (see my edit above)
Stomped
@Stomped - edited.
Matchu