ansaurus

Question

Why does preg_match_all poop out after so many characters?

Answer 1

+1 A:

I bumped into this one once, and the only way I could solve it back then, was by splitting the string. You could explode() or preg_split().

Quoting literally from my source code:

    // regexps have failed miserably on very large tables...
    $parts = explode("<table",$html);

But this was two years ago.

mvds 2010-07-16 02:09:50

pretty good idea.

Stephen 2010-07-16 02:11:14

Answer 2

A:

It looks like you're working with HTML. You might want to consider working with one of the various parsers. For example, DOM has a specific class for comments, so we know it can work with them. Unfortunately the DOM is a bit awkward to work with.

Another option might be to use XMLReader, which reads XML as a stream and processes it as tokens along the way. It seems to understand what comments are. I've never used it myself, so I can't tell you how well it works. (You can use DOM's loadHTML and saveXML methods to convert your HTML into XML, assuming it's not too horribly formed.)

Finally, you might consider writing a tokenizer or parser for your custom comments. It shouldn't be too difficult, and may well be faster for you to hack together than learning either of the XML solutions I've pointed out.

Charles 2010-07-16 04:39:58

ansaurus

tags:

views:

answers:

Why does preg_match_all poop out after so many characters?

related questions