We are often told that Regexps are slow and should be avoided whenever possible.
However, taking into account the overhead of doing some string manipulation oneself (not talking about algorithm mistakes - this is a different matter), especially in PHP
or Perl
(maybe Java
) what is the limit, in which case can we consider string manipulation to be a better alternative? What regexps are particularly CPU greedy?
For instance, for the following, in C++
, Java
, PHP
or Perl
, what would you recommend
The regexps would probably be faster:
s/abc/def/g
or a... while((i=index("abc",$x)>=0) ...$y .= substr()...
based solution?s/(\d)+/N/g
or a scanning algorithm
But what about
- an email validation regexp?
s/((0|\w)+?[xy]*[^xy]){2,7}/u/g
wouldn't a handmade and specific algorithm be faster (while longer to write)?
edit
The point of the question is to determine what kind of regexp would better be rewritten specifically for a given problem via string manipulation?
edit2
A common implementation is Perl regexp. For instance in Perl - that requires to know how they are implemented - what kind of regexp is to be avoided, because the implementation will make the process lengthy and ineffective? It may not be a complex regexp...