tags:

views:

1571

answers:

4
+4  Q: 

PHP ereg vs. preg

I have noticed in the PHP regex library there is a choice between ereg and preg. What is the difference? Is one faster than the other and if so, why isn't the slower one deprecated?

Are there any situations where it is better to use one over the other?

+7  A: 

preg is the Perl Compatible Regex library
ereg is the POSIX complient regex library

They have a slightly diffrent syntax and preg is in some cases slightly faster. ereg is deprecated (and it is removed in php6) so I wouldn't recommend that it is used.

Yacoby
Please clarify why the downvote of this answer
Yacoby
+10  A: 

Visiting php.net/ereg displays the following:

Warning

This function has been DEPRECATED as of PHP 5.3.0 and REMOVED as of PHP 6.0.0. Relying on this feature is highly discouraged.

Down the page just a bit further and we read this:

Note: preg_match(), which uses a Perl-compatible regular expression syntax, is often a faster alternative to ereg().

Note my emphasis.

Jonathan Sampson
ah ok thanks I didn't see that for whatever reason. Goodbye ereg I suppose! Accepted answer.
Evernoob
+1  A: 

Well, ereg and its derivate functions (ereg_match, etc) are deprecated in php5 and being removed in php6, so you're probably best going with the preg family instead.

preg is for Perl-style regular expressions, while ereg is standard POSIX regex.

Amber
+1  A: 

There is much discussion about which is faster and better.

If you plan on someday advancing to PHP6 your decision is made. Otherwise:

The general consensus is that PCRE is the better all around solution, but if you have a specific page with a lot of traffic, and you don't need PHP6 it may be worth some testing. For example, from the PHP manual comments:

Deprecating POSIX regex in PHP for Perl searching is like substituting wooden boards and brick for a house with pre-fabricated rooms and walls. Sure, you may be able to mix and match some of the parts but it's a lot easier to modify with all the pieces laid out in front of you.

PCRE faster than POSIX RE? Not always. In a recent search-engine project here at Cynergi, I had a simple loop with a few cute ereg_replace() functions that took 3min to process data. I changed that 10-line loop into a 100-line hand-written code for replacement and the loop now took 10s to process the same data! This opened my eye to what can IN SOME CASES be very slow regular expressions. Lately I decided to look into Perl-compatible regular expressions (PCRE). Most pages claim PCRE are faster than POSIX, but a few claim otherwise. I decided on bechmarks of my own. My first few tests confirmed PCRE to be faster, but... the results were slightly different than others were getting, so I decided to benchmark every case of RE usage I had on a 8000-line secure (and fast) Webmail project here at Cynergi to check it out. The results? Inconclusive! Sometimes PCRE are faster (sometimes by a factor greater than 100x faster!), but some other times POSIX RE are faster (by a factor of 2x). I still have to find a rule on when are one or the other faster. It's not only about search data size, amount of data matched, or "RE compilation time" which would show when you repeated the function often: one would always be faster than the other. But I didn't find a pattern here. But truth be said, I also didn't take the time to look into the source code and analyse the problem. I can give you some examples, though. The POSIX RE ([0-9]{4})/([0-9]{2})/([0-9]{2})[^0-9]+ ([0-9]{2}):([0-9]{2}):([0-9]{2}) is 30% faster in POSIX than when converted to PCRE (even if you use \d and \D and non-greedy matching). On the other hand, a similarly PCRE complex pattern /[0-9]{1,2}[ \t]+[a-zA-Z]{3}[ \t]+[0-9]{4}[ \t]+[0-9]{1,2}:[0-9]{1,2}(:[0-9]{1,2})?[ \t]+[+-][0-9]{4}/ is 2.5x faster in PCRE than in POSIX RE. Simple replacement patterns like ereg_replace( "[^a-zA-Z0-9-]+", "", $m ); are 2x faster in POSIX RE than PCRE. And then we get confused again because a POSIX RE pattern like (^|\n|\r)begin-base64[ \t]+[0-7]{3,4}[ \t]+...... is 2x faster as POSIX RE, but the case-insensitive PCRE /^Received[ \t]:[ \t]by[ \t]+([^ \t]+)[ \t]/i is 30x faster than its POSIX RE version! When it comes to case sensitivity, PCRE has so far seemed to be the best option. But I found some really strange behaviour from ereg/eregi. On a very simple POSIX RE (^|\r|\n)mime-version[ \t]: I found eregi() taking 3.60s (just a number in a test benchmark), while the corresponding PCRE took 0.16s! But if I used ereg() (case-sensitive) the POSIX RE time went down to 0.08s! So I investigated further. I tried to make the POSIX RE case-insensitive itself. I got as far as this: (^|\r|\n)[mM][iI][mM][eE]-vers[iI][oO][nN][ \t]: This version also took 0.08s. But if I try to apply the same rule to any of the 'v', 'e', 'r' or 's' letters that are not changed, the time is back to the 3.60s mark, and not gradually, but immediatelly so! The test data didn't have any "vers" in it, other "mime" words in it or any "ion" that might be confusing the POSIX parser, so I'm at a loss. Bottom line: always benchmark your PCRE / POSIX RE to find the fastest! Tests were performed with PHP 5.1.2 under Windows, from the command line. Pedro Freire cynergi.com

SamGoody