views:

73

answers:

4

on a given page there are bunch of elements:

<div class="some class"> <-- here is anything, other divs, even other divs with
the same class, but I need to match right on closing tag for this particular
opening tag --></div>
A: 

The only robust solution is to parse the HTML, regexps can't solve this in all cases.

In fact browsers are often very tolerant, they even cope with errors such as missing

< / P >

tags. So dealing with arbitrary pages is actually quite tricky.

If you are dealing with a page that you produce yourself, then perhaps you can code some special case regexps. Otherwise you may need to seek out a true parser such as this. (never used it myself, but it may well be what you need.)

djna
+2  A: 

DOMDocument

Mark Baker
+1  A: 

Regular expressions describe operations on regular languages. HTML is not a regular language. I'd be prepared to bet you could do it with a so-called "recursive regular expression" as they aren't really regular expressions and aren't limitied to regular languages. I'd be prepared to bet more that you'd be better off parsing it instead anyway.

The easist (not the best, but the easiest to code in a few lines), is to keep a count of inner divs. Whenever you encounter a div tag, up the count. Whenver you encounter a closing div tag, then drop the count if it's non-zero, or you've found your complete element. Whenever you encounter the end of the file, somebody hasn't closed their divs properly.

Using an XML parser is easier still if you can either depend on the code being well-formed (if you can't, you've got two problems...) or are prepared to just error in the case of non-well-formed input.

Jon Hanna