PHP Regex: How to match anything except a pattern between two tags

Hello, I am attempting to match a string which is composed of HTML. Basically it is an image gallery so there is a lot of similarity in the string. There are a lot of <dl> tags in the string, but I am looking to match the last <dl>(.?)+</dl> combo that comes before a </div>.

The way I've devised to do this is to make sure that there aren't any <dl's inside the <dl></dl> combo I'm matching. I don't care what else is there, including other tags and line breaks.

I decided I had to do it with regular expressions because I can't predict how long this substring will be or anything that's inside it.

Here is my current regex that only returns me an array with two NULL indicies:

preg_match_all('/<dl((?!<dl).)+<\/dl>(?=<\/div>)/', $foo, $bar)

As you can see I use negative lookahead to try and see if there is another <dl> within this one. I've also tried negative lookbehind here with the same results. I've also tried using +? instead of just + to no avail. Keep in mind that there's no pattern <dl><dl></dl> or anything, but that my regex is either matching the first <dl> and the last </dl> or nothing at all.

Now I realize . won't match line breaks but I've tried anything I could imagine there and it still either provides me with the NULL indicies or nearly the whole string (from the very first occurance of <dl to </dl></div>, which includes several other occurances of <dl>, exactly what I didn't want). I honestly don't know what I'm doing incorrectly.

Thanks for your help! I've spent over an hour just trying to straighten out this one problem and it's about driven me to pulling my hair out.

I've posted this answer so often, I wonder when google will start providing a link to that answer when someone searches for 'pain' on their site.

soulmerge 2010-04-30 08:18:36

Thanks for your response, you must have that response as a template because I've seen it other locations as well. I would certainly consider a parser, but I know exactly how the HTML is formatted as I, myself, generate it in another file. So since I know the general form the HTML is going to take, I took regex to be an acceptable solution. Also, I didn't want to slow down the execution any more than necessary since I already consider the load-time borderline of this particular page

Ryan 2010-04-30 16:45:26

ansaurus

tags:

views:

answers:

PHP Regex: How to match anything except a pattern between two tags

related questions