views:

42

answers:

3

Hey I'm trying to do something quite specific with regex in javascript and my regexp-foo is shakey at best. Wondered if there were any pros out there who could point me in the right direction. So I have some text...

<item id="myid1">myitem1</item>
<item id="myid2">myitem2</item>

...etc

And I would like to strip it out into an array that reads myid1, myitem1, myid2, myitem2, ....etc

There will never be nested elements so there is no recursive nesting problem. Anyone able to bash this out quickly? Thanks for your help!

A: 

I always use this site to build my regexes:

http://www.pagecolumn.com/tool/regtest.htm

This is the regex I came up with:

(<[^>]+>)([^<]+)(<[^>]+>)

And this is the result that the page gives me for JavaScript

Using RegExp object:

var str = "<item id="myid1">myitem1</item><item id="myid2">myitem2</item><ssdad<sdasda><>dfsf";
var re = new RegExp("(<[^>]+>)([^<]+)(<[^>]+>)", "g");
var myArray = str.match(re);

Using literal:

var myArray = str.match(/(<[^>]+>)([^<]+)(<[^>]+>)/g)

if ( myArray != null) {
    for ( i = 0; i < myArray.length; i++ ) { 
        var result = "myArray[" + i + "] = " + myArray[i];
    }
}
Sjuul Janssen
+1  A: 

This is a xml string. A XML parser seems suited best for this kind of task in my opinion. Do the following:

var items = document.getElementsByTagName("item") ; //<> use the parent element if document is not
var dataArray = [ ] ;

for(var n = 0 ; n < items.length ; n++) {

     var id = items[n].id ;
     var text = items[n].childNodes[0] ;

         dataArray.push(id,text) ;

}

If your problem is that you cannot convert the xml string to an xml object, you will have to use a DOM parser beforehand:

var xmlString = "" ; //!! your xml string
var document = null ;

    if (window.ActiveXObject) { //!! for internet explorer

            document = new ActiveXObject("Microsoft.XMLDOM") ;
            document.async = "false" ;
            document.loadXML(xmlString) ;

    } else { //!! for everything else

        var parser = new DOMParser() ;
            document = parser.parseFromString(xmlString,"text/xml") ;

    }

Then use the above script.

FK82
Using an xml parser would be the ideal solution but unfortunately I have no access to dom manipulation and it's a bit of an overkill for this problem. Thanks though!
Thomas
Well, in my case, coming up with a regex pattern would last longer. Also, since you have the XML string, you have dom access through building a DOM object as described. Anyway, you're welcome!
FK82
+1  A: 

Here's a regex that will:

  • Match the starting and ending tag element names
  • Extract the value of the id attribute
  • Extract the inner html contents of the tag

Note: I am being lazy in matching the attribute value here. It needs to be enclosed in double quotes, and there needs to be no spaces between the attribute name and its value.

<([^\s]+).*?id="([^"]*?)".*?>(.+?)</\1>

Running the regex in javascript would be done like so:

search = '<item id="item1">firstItem</item><item id="item2">secondItem</item>';
regex = new RegExp(/<([^\s]+).*?id="([^"]*?)".*?>(.+?)<\/\1>/gi);
matches = search.match(regex);
results = {};
for (i in matches) {
    parts = regex.exec(matches[i]);
    results[parts[2]] = parts[3];
}

At the end of this, results would be an object that looks like:

{
    "item1": "firstItem",
    "item2": "secondItem"
}

YMMV if the <item> elements contain nested HTML.

Chris
Great, thanks! Changed the re to be... /<item[^>]*id=["'](.*?)["']>(.*?)<\/item>/gi and seems to work spot on :-)
Thomas