views:

509

answers:

3

I would like to catch all "dev" tags and their respective content, through php preg_match_all() but can't get the nested ones.

data:

<dev>aaa</dev> <dev>bbb</dev> <dev> ccc <dev>ddd</dev> </dev>

my expression so far:

|<dev>(.*)</dev>|Uis

thanks, for your help, b.

A: 

The * is a greedy operator, consumes as many characters as possible. You should use the *? non-greedy version instead to find the smallest possible matches. Maybe regexes are not the best tools to do this.

bandi
+5  A: 

Don’t use regular expressions for parsing. Use a real parser like DOMDocument or SimpleXML:

$xml = simplexml_load_string('<root>'.$str.'</root>');
Gumbo
Absolutely. There are several other examples at http://stackoverflow.com/questions/1417795/replacing-image-src-in-html-tags and http://stackoverflow.com/questions/1416425/preg-replace-preg-match-for-href-in-html-link
TrueWill
A: 

You need to have a recursive matching pattern:

/<dev>(.*|(?R))<\/dev>/i

That will just suck up any nested elements, so if you want to then parse those, you will have to run the function again on $matches[1]

Will Earp
you should use / or # as your preg delimiter, the | (pipe) character is used to alternate matches
Will Earp