views:

2923

answers:

8

Following a jQuery ajax call to retrieve an entire XHTML document, what is the best way to select specific elements from the resulting string? Perhaps there is a library or plugin that solves this issue?

jQuery can only select XHTML elements that exist in a string if they're normally allowed in a div in the W3C specification; therefore, I'm curious about selecting things like <title>, <script>, and <style>.

According to the jQuery documentation:

http://docs.jquery.com/Core/jQuery#htmlownerDocument

The HTML string cannot contain elements that are invalid within a div, such as html, head, body, or title elements.

Therefore, since we have established that jQuery does not provide a way to do this, how would I select these elements? As an example, if you can show me how to select the remote page's title, that would be perfect!

Thanks, Pete

A: 

If you wanted to find the value of specifically named fields (i.e. the inputs in a form) something like this would find them for you:

var fields = ["firstname","surname", ...."foo"];

function findFields(form, fields) {
  var form = $(form);
  fields.forEach(function(field) {
    var val = form.find("[name="+field+"]").val();
    ....
timbo
Thanks but that's not what I want to do.
slypete
I see you've re-edited your question. If you want to process an entire XHTML piece, then something Like Resig's micro-template could point you in the right direction...see http://ejohn.org/blog/javascript-micro-templating/
timbo
Not sure how either of your responses could even indirectly answer my question. I did not make a major edit to this question.
slypete
A: 

How about this: Load XML from string

Svante
+1  A: 

Shamelessly copied and adapted from another of my answers (http://stackoverflow.com/questions/1033460/simple-jquery-ajax-example-not-finding-elements-in-returned-html/1033756#1033756), this fetches the HTML of the remote page, then the parseHTML function creates a temporary div element for it and puts the lot inside, runs through it, and returns the requested element. jQuery then alerts the text() inside.

$(document).ready(function(){
  $('input').click(function(){
    $.ajax({
      type : "POST",
      url : 'ajaxtestload.html',
      dataType : "html",
      success: function(data) {
        alert( data ); // shows whole dom
        var gotcha = parseHTML(data, 'TITLE'); // nodeName property returns uppercase
        if (gotcha) {
          alert($(gotcha).html()); // returns null
        }else{
          alert('Tag not found.');
        }
      },
      error : function() {
        alert("Sorry, The requested property could not be found.");
      }
    });
  });
});

function parseHTML(html, tagName) {
  var root = document.createElement("div");
  root.innerHTML = html;
  // Get all child nodes of root div
  var allChilds = root.childNodes;
  for (var i = 0; i < allChilds.length; i++) {
    if (allChilds[i].nodeName == tagName) {
      return allChilds[i];
    }
  }
  return false;
}

To get several items out or a list of script tags, say, I think you'd have to improve the parseHTML function, but hey - proof of concept :-)

MSpreij
Surely applying a simple linear search is not best practice.
slypete
Surely not - but the example shows a way in which you can select tags from external documents outside the body as well, that which jQuery couldn't. http://www.javascriptkit.com/dhtmltutors/treewalker.shtml might be useful?
MSpreij
Also, I'm pretty sure this would fall victim to the same issue that jQuery has with fetching <head><title><meta> etc tags. Read the link in the question about htmlOwnerDocument
gnarf
gnarf, not sure what you mean - I tested the code, it works? jQuery doesn't really fetch HTML, just 'data'. The non-jQuery function does the parsing of that data into tags, one of which is then returned to jQuery. If you do run into a tag that jQuery doesn't like, you can always parse the content of it with the same function.
MSpreij
+1  A: 

How about some quick tag renaming?

$.ajax({
 type : "GET",
 url : 'results.html',
 dataType : "html",
 success: function(data) {

  data = data.replace(/html/g, "xhtmlx");
  data = data.replace(/head/g, "xheadx");
  data = data.replace(/title/g, "xtitlex");
  data = data.replace(/body/g, "xbodyx");

  alert($(data).find("xtitlex").text());
 }

});
Ben Koehler
@Ben Koehler : I'd like use this code but, I'm surprised, not work on IE8 (didn't test on IE7), it's ok on Chrome 2.x and FF 3.x
Kris-I
+2  A: 

Just an idea - tested in FF/Safari - seems to work if you create an iframe to store the document temporarily. Of course, if you are doing this it might be smarter to just use the src property of the iframe to load the document and do whatever you want in the "onload" of it.

  $(function() {
    $.ajax({
      type: 'GET', 
      url: 'result.html',
      dataType: 'html',
      success: function(data) {
        var $frame = $("<iframe src='about:blank'/>").hide();
        $frame.appendTo('body');
        var doc = $frame.get(0).contentWindow.document;
        doc.write(data);
        var $title = $("title", doc);
        alert('Title: '+$title.text() );
        $frame.remove();
      }
    });

I had to append the iframe to the body to get it to have a .contentWindow.

gnarf
That's a nifty answer
Dan F
Why didn't I think of iframes in the first place? This is great! The only possible problem I see is maybe iframe won't work if you have to get a document from another site. I haven't tried it, but I think the cross-domain script policy will forbid any interaction with the source.
Igor Zinov'yev
This works, but the question is looking for best practice not a hack.
slypete
@zinigor That's a moot point because you can't even make a cross-domain AJAX request.
Josh Stodola
@Josh Stodola Yes, but I wasn't talking about AJAX requests. You can simply edit the iframe's src attribute and make it point to another site, and it doesn't have to be on the same domain. And that's where the same origin policy restrictions will restrict all interaction.
Igor Zinov'yev
+1  A: 

This works. I just split up the building blocks for better readability.

Check the explanation and the inline comments to grasp the workings of this and why it has to be made like this.

Of course this can't be used to retrieve cross-domain-content for that you either have to proxy the calls through a script of yours or think about integration something like flXHR (Cross-Domain Ajax with Flash)

call.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>asd</title>
    <script src="jquery.js" type="text/javascript"></script>
    <script src="xmlDoc.js" type="text/javascript"></script>
    <script src="output.js" type="text/javascript"></script>
    <script src="ready.js" type="text/javascript"></script>
  </head>
  <body>
    <div>
      <input type="button" id="getit" value="GetIt" />
    </div>
  </body>
</html>

jquery.js is (jQuery 1.3.2 uncompressed) test.html a valid XHTML-Document

xmlDoc.js

// helper function to create XMLDocument out of a string
jQuery.createXMLDocument = function( s ) {
  var xmlDoc;
  // is it a IE?
  if ( window.ActiveXObject ) {
    xmlDoc = new ActiveXObject('Microsoft.XMLDOM');
    xmlDoc.async = "false";
    // prevent erros as IE tries to resolve the URL in the DOCTYPE
    xmlDoc.resolveExternals = false;
    xmlDoc.validateOnParse = false;
    xmlDoc.loadXML(s);
  } else {
    // non IE. give me DOMParser
    // theoretically this else branch should never be called
    // but just in case.
    xmlDoc = ( new DOMParser() ).parseFromString( s, "text/xml" );
  }
  return xmlDoc;
};

output.js

// Output the title of the loaded page
// And get the script-tags and output either the
// src attribute or code
function headerData(data) {
  // give me the head element
  var x = jQuery("head", data).eq(0);
  // output title
  alert(jQuery("title", x).eq(0).text());
  // for all scripttags which include a file out put src
  jQuery("script[src]", x).each(function(index) {
    alert((index+1)+" "+jQuery.attr(this, 'src'));
  });
  // for all scripttags which are inline javascript output code
  jQuery("script:not([src])", x).each(function(index) {
    alert(this.text);
  });
}

ready.js

$(document).ready(function() {
  $('#getit').click(function() {
    $.ajax({
      type : "GET",
      url : 'test.html',
      dataType : "xml",
      // overwrite content-type returned by server to ensure
      // the response getst treated as xml
      beforeSend: function(xhr) {
        // IE doesn't support this so check before using
        if (xhr.overrideMimeType) {
          xhr.overrideMimeType('text/xml');
        }
      },
      success: function(data) {
        headerData(data);
      },
      error : function(xhr, textStatus, errorThrown) {
        // if loading the response as xml failed try it manually
        // in theory this should only happen for IE
        // maybe some
        if (textStatus == 'parsererror') {
          var xmlDoc = jQuery.createXMLDocument(xhr.responseText);
          headerData(xmlDoc);
        } else {
          alert("Failed: " + textStatus + " " + errorThrown);
        }
      }
    });
  });
});

In Opera the whole thing works without the createXMLDocument and the beforeSend function.

The extra trickery is needed for Firefox (3.0.11) and IE6 (can't test IE7, IE8, other browsers) as they have a problem when the Content-Type: returned by the server doesn't indicate that it's xml. My webserver returned Content-Type: text/html; charset=UTF-8 for test.html. In those two browsers jQuery called the error callback with textStatus saying parsererror. Because in line 3706 in jQuery.js

data = xml ? xhr.responseXML : xhr.responseText;

data is being set to null. As in FF and IE the xhr.responseXML is null. This happens because they don't get that the returned data is xml (as Opera does). And only xhr.responseText is set with the whole xhtml-code. As data is null the line 3708

if ( xml && data.documentElement.tagName == "parsererror" )

throws an exception which is catched in line 3584 and status is set to parsererror.

In FF I can solve the problem by using the overrideMimeType() function before sending the request.

But IE doesn't support that function on the XMLHttpRequest-object so I have to generate the XMLDocument myself if the error-callback is run and the error is parsererror.

example for test.html

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>Plugins | jQuery Plugins</title>
    <script type="text/javascript" src="jquery.js"></script>
    <script type="text/javascript">var imagePath = '/content/img/so/';</script>
  </head>
  <body>
  </body>
</html>
jitter
+6  A: 

Instead of hacking jQuery to do this I'd suggest you drop out of jQuery for a minute and use raw XML dom methods. Using XML Dom methods you would can do this:

  window.onload = function(){ 
    $.ajax({
          type: 'GET', 
          url: 'text.html',
          dataType: 'html',
          success: function(data) {

            //cross platform xml object creation from w3schools
            try //Internet Explorer
              {
              xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
              xmlDoc.async="false";
              xmlDoc.loadXML(data);
              }
            catch(e)
              {
              try // Firefox, Mozilla, Opera, etc.
                {
                parser=new DOMParser();
                xmlDoc=parser.parseFromString(data,"text/xml");
                }
              catch(e)
                {
                alert(e.message);
                return;
                }
              }

            alert(xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue);
          }
    });
  }

No messing about with iframes etc.

David Burrows
I was torn between this and using regular expressions. Since I'm asking for best practice, I chose this answer.
slypete
A: 

After parsing the XML string into an XML DOM, I'd either use jQuery on it directly (you can do this by providing a context to the jQUery selector, such as $(':title', xdoc.rootElement) or using XPath (works in Firefox; there are supposedly libraries for IE but I haven't had good success with them).

Michael E
From the question: "since we have established that jQuery does not provide a way to do this" ...
slypete