views:

81

answers:

3

Is there a way to tell preg_match_all to use the third match it finds skipping the first two? For example, I have the following HTML

<div class="entry">
    <div class="text">BlaBlaBla</div>
    <div class="date">2009-10-31</div>
</div>

I need preg_match_all to get the contents of the outermost div, and not stop at the first /div it encounters.

Have been banging my head against the wall for awhile now :(

+5  A: 

This is the class of problem that regular expressions theoretically cannot handle: recursively defined structures. Extended RE's might be able to sort-of do it, but (to mix metaphors) it's better to punt and pick up a different tool.

Having said that, PCRE specifically has a recursive pattern feature, the typical demonstration is \((a*|(?R))*\) which can handle any combination of balanced parens and as. So you can probably adapt that, but you are trying to do something that I wouldn't try to do with REs.

Update: I'm not sure how useful this will be, but:

php > $t = "<div> how <div> now is the time </div>  now </div>";
php > preg_match('/<div>(.*|(?R))*<\/div>/',$t,$m); print_r($m);
Array
(
    [0] => <div> how <div> now is the time </div>  now </div>
    [1] => 
)
php >
DigitalRoss
So there is absolutely no way to say "match the third </div>"?
clops
About your update : almost :) it goes to the very last /div it sees in the whole string, whereas I need it to stop at the third match.
clops
Sigh, I'm not sure I can do this in RE's. You really need something that grok's XPath or the DOM or .. some kind of parser, there are zillions. Anyway, look also at the RE syntax for `(pattern){m,n}` which will find a specific number of matches, and `(pattern){m,n}?` which is the non-greedy version.
DigitalRoss
+5  A: 

You would be much better served by something like an XML/HTML parser. See here.

Chris Kloberdanz
python minidom comes into mind.
Calyth
yeah, definitely should use a parser instead of a regular expression for this
omouse
A: 

You can use XPath's "Axis specifiers" and "node set functions"

Gökhan Ercan