views:

253

answers:

3

I am trying to extract all the <input > tags out of a <form> tag. I have created a regexp which can identify the entire <form> tag and all the code up to the ending </form> but I cannot figure out how to match all the <input[^>]+> within that.

EDIT: The data is a string. I cannot use DOM functions because it's not part of the document. if I insert it into a hidden tag, it changes the layout of the page because the string contains an entire HTML page including links to external stylesheets.

+3  A: 

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Chas. Owens
I actually just read that. Know of any way using javascript to convert a string to a Document object I can use DOM functions on? The string is HTML, not XML, and I cannot insert it into the page nor use an iFrame. (http://stackoverflow.com/questions/855404/can-javascript-access-the-dom-of-an-ajax-text-html-response)
Josh
@Josh No, I don't (JavaScript is really my thing), but that is an interesting question. If only there were a place to ask interesting questions...
Chas. Owens
@Chas. Thanks! Actually I am reading jQuery's source -- on that second link you provided someone claimed they were able to do this using jQuery.
Josh
that should have been "JavaScript is really NOT my thing".
Chas. Owens
@Chas. Owens -- I found the answer using DOM parsing. Which is what I wanted to do before but couldn't figure out how. Or rather, figured out how but must have had an error somewhere else and thought what I was trying to do wasn't possible. Thanks!
Josh
A: 

Why can't you just use the DOM?

var inputFields = document.getElementById('form_id').getElementsByTagName('input');
for (var i = 0, l = inputFields.length; i < l; i++) {
    // Do something with inputFields[i] ...
}

If you must use regex:

var formHTML = document.getElementById('form_id').innerHTML;
var inputs = formHTML.match(/<input.+?\/?>/g);

Note, the above regular expression is not reliable and will not work in ALL situations, hence why you should use the DOM! :)

J-P
Wish I could use DOM. problem is I have a string not a Document object. Inserting string into the document causes document layout to change. Know of any way to parse an HTML string into a Document object w/o rendering it?
Josh
You could render it in a hidden iframe...
J-P
@J-P It is possible, using document.createElement() and innerHTML on that element. See http://stackoverflow.com/questions/855404/can-javascript-access-the-dom-of-an-ajax-text-html-response
Josh
A: 

You can use document.createElement to create some element and then (ab)use it's innerHTML property to create a DOM from a string:

var html = document.createElement("div");
html.innerHTML = "<form><input/><input/><input/></form>";

// now you can use dom methods, e.g. getElementsByTagName
var inputs = html.getElementsByTagName("input");
var foo = inputs[0].value; // ...

You might have to manually remove your <html> tags beforehand though as IE has trouble parsing full documents (if I remember correctly).

Josef