ansaurus

Question

PHP REGEX: Pls help me build the proper regex for this:

Answer 1

+4 A:

This is a pretty common question here ("How do I parse this XML/HTML with a regular expression?") and I'll give you the same answer: don't.

Regular expressions are notoriously bad at this kind of thing. HTML/XML is not "regular" in the regex sense.

PHP comes with at least 3 XML parsers (SimpleXML, DOMDocument and XMLReader spring to mind) that will do this reliably. Use one of those.

Take a look at Parse HTML With PHP And DOM as an example.

cletus 2009-09-11 03:49:19

:-) It's a fun wheel to re-invent, if you want to sharpen your regex skills, but the answer is correct, whatever you build will fail on some case and there is a reason people build parser libraries.

Devin Ceartas 2009-09-11 03:51:16

Answer 2

+1 A:

sounds like the trouble you're having is that the * is greedy, ie it matches as much as possible, where you want it to match a little as possible.

If the data inside your divs does not contain "</div>" then you can keep the parsing pretty simple. If it can contain arbitrary HTML data (specifically nested divs), you'll need to parse it more.

If it stays basic, you could do the whole thing without regex. It's a little hackish, but as long as your data says simple, and expected, it should work really fast:

$chunks = explode($body, '<div class = "imageElement">');
array_shift($chunks);
$matches = array();
foreach($chunks as $chunk) {
    $pos = strpos('</div>', $chunk);
    if($pos) {
        $matches[] = substr($chunk, 0, $pos);
    {
}

If you need something more flexible, use a real html parser.

JasonWoof 2009-09-11 03:53:29

ansaurus

tags:

views:

answers:

PHP REGEX: Pls help me build the proper regex for this:

related questions