views:

52

answers:

1

I need to tokenize following tag:

{TagName attrib1=”value1” attrib2=”value 3”}.

I would like to write regex to do it, but the trouble is that attribute value can contain space, so I can’t just split with space.

+1  A: 

can't be put more clearly than this:

http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

please explain why you need regexp...

and, you didn't say anything about your preferred language...

assuming perl:

$str = "{TagName attrib1=\"value1\" attrib2=\"value 3\"}";

if ($str =~ m/{(\w+)\s+(\w+)="(.*?)"\s+(\w+)="(.*?)"/)
{
    print "tagname: $1\n";
    print "attrib: $2\n";
    print "value: $3\n";
    print "attrib: $4\n";
    print "value: $5\n";
}

But again, don't use regexps for this!!

Fredrik
the classic post: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454
paracaudex
preffered language is java
Dan