views:

61

answers:

4

I have some XML which I want to extract via a javascript regular expression. An example of the XML is shown below:

<rules><and><gt propName="Unit" value="5" type="System.Int32"/><or><startsWith propName="DeviceType"/></or></and></rules>

I’m having problems extracting just the xml names “gt” and “startsWith”. For example, with the following expression

<(.+?)\s

I get:

“<rules><and><gt”

rather than just “gt”.

Can anyone supply the correct expression?

+4  A: 

Regex is a poor tool to parse xml. You can easily parse the XML in JavaScript. A library like jQuery makes this task especially easy (for example):

var xml = '<rules><and><gt propName="Unit" value="5" type="System.Int32"/><or><startsWith propName="DeviceType"/></or></and></rules>';
var gt = $('gt', xml);
var t = gt.attr('type'); //System.Int32
Kobi
+1  A: 

Don't use a regex to do this kind of things. Rather use the DOM processing functions such as

var gtElements = document.getElementsByTagName('gt');
var startsWithElements = document.getElementsByTagName('startsWith'); 
teukkam
Or loop through the gtElements and use gtElements.getElementsByTagName('startsWith')
Alex
Exactly. I missed the fact that <gt> and <startsWith> are nested.
teukkam
+2  A: 

Well, \s matches whitespace. So you actually tell the regex engine to:

<(.+?)\s
^^    ^
||    \ until you find a whitespace
|\ slurp in anything (but whitespace)
\ as long as it starts with an opening pointy bracket

You could, for example use:

<([^\s>]+?)

but you should always consider this.

Boldewyn
+1 for the link to the answer :)
Dror
+1 Yes, that's one great link
seanizer
@Downvoter: Care to explain? You could give me a try to improve the answer.
Boldewyn
I dropped the regex for an xml parser which now works fine. Thanks to all.
Retrocoder
+1  A: 

The most robust method would be to use the browser's built-in XML parser and standard DOM methods for extracting the elements you want:

var parseXml;

if (window.DOMParser) {
    parseXml = function(xmlStr) {
        return ( new window.DOMParser() ).parseFromString(xmlStr, "text/xml");
    };
} else if (typeof window.ActiveXObject != "undefined" &&
        new window.ActiveXObject("Microsoft.XMLDOM")) {
    parseXml = function(xmlStr) {
        var xmlDoc = new window.ActiveXObject("Microsoft.XMLDOM");
        xmlDoc.async = "false";
        xmlDoc.loadXML(xmlStr);
        return xmlDoc;
    };
} else {
    parseXml = function() { return null; }
}

var xmlStr = '<rules><and>' +
    '<gt propName="Unit" value="5" type="System.Int32"/><or>' + 
    '<startsWith propName="DeviceType"/></or></and></rules>';

var xmlDoc = parseXml(xmlStr);
if (xmlDoc) {
    var gt = xmlDoc.getElementsByTagName("gt")[0];
    alert( gt.getAttribute("propName") );
}
Tim Down