ansaurus

Question

regex to find tag id and content JavaScript

Answer 1

A:

I always use this site to build my regexes:

http://www.pagecolumn.com/tool/regtest.htm

This is the regex I came up with:

(<[^>]+>)([^<]+)(<[^>]+>)

And this is the result that the page gives me for JavaScript

Using RegExp object:

var str = "<item id="myid1">myitem1</item><item id="myid2">myitem2</item><ssdad<sdasda><>dfsf";
var re = new RegExp("(<[^>]+>)([^<]+)(<[^>]+>)", "g");
var myArray = str.match(re);

Using literal:

var myArray = str.match(/(<[^>]+>)([^<]+)(<[^>]+>)/g)

if ( myArray != null) {
    for ( i = 0; i < myArray.length; i++ ) { 
        var result = "myArray[" + i + "] = " + myArray[i];
    }
}

Sjuul Janssen 2010-07-17 10:33:24

Answer 2

+1 A:

This is a xml string. A XML parser seems suited best for this kind of task in my opinion. Do the following:

var items = document.getElementsByTagName("item") ; //<> use the parent element if document is not
var dataArray = [ ] ;

for(var n = 0 ; n < items.length ; n++) {

     var id = items[n].id ;
     var text = items[n].childNodes[0] ;

         dataArray.push(id,text) ;

}

If your problem is that you cannot convert the xml string to an xml object, you will have to use a DOM parser beforehand:

var xmlString = "" ; //!! your xml string
var document = null ;

    if (window.ActiveXObject) { //!! for internet explorer

            document = new ActiveXObject("Microsoft.XMLDOM") ;
            document.async = "false" ;
            document.loadXML(xmlString) ;

    } else { //!! for everything else

        var parser = new DOMParser() ;
            document = parser.parseFromString(xmlString,"text/xml") ;

    }

Then use the above script.

FK82 2010-07-17 11:01:41

Using an xml parser would be the ideal solution but unfortunately I have no access to dom manipulation and it's a bit of an overkill for this problem. Thanks though!

Thomas 2010-07-17 14:24:53

Well, in my case, coming up with a regex pattern would last longer. Also, since you have the XML string, you have dom access through building a DOM object as described. Anyway, you're welcome!

FK82 2010-07-17 16:20:14

Answer 3

+1 A:

Here's a regex that will:

Match the starting and ending tag element names
Extract the value of the id attribute
Extract the inner html contents of the tag

Note: I am being lazy in matching the attribute value here. It needs to be enclosed in double quotes, and there needs to be no spaces between the attribute name and its value.

<([^\s]+).*?id="([^"]*?)".*?>(.+?)</\1>

Running the regex in javascript would be done like so:

search = '<item id="item1">firstItem</item><item id="item2">secondItem</item>';
regex = new RegExp(/<([^\s]+).*?id="([^"]*?)".*?>(.+?)<\/\1>/gi);
matches = search.match(regex);
results = {};
for (i in matches) {
    parts = regex.exec(matches[i]);
    results[parts[2]] = parts[3];
}

At the end of this, results would be an object that looks like:

{
    "item1": "firstItem",
    "item2": "secondItem"
}

YMMV if the <item> elements contain nested HTML.

Chris 2010-07-17 11:11:08

Great, thanks! Changed the re to be... /<item[^>]*id=["'](.*?)["']>(.*?)<\/item>/gi and seems to work spot on :-)

Thomas 2010-07-17 14:26:41

ansaurus

tags:

views:

answers:

regex to find tag id and content JavaScript

related questions