views:

218

answers:

6

Hello.

This question is designed around the performance within PHP but you may broaden it to any language if you wish to.

After many years of using PHP and having to compare strings I've learned that using string comparison operators over regular expressions is beneficial when it comes to performance.

I fully understand that some operations have to be done with Regular Expressions down to there complexity but for operations that can be resolved via regex AND string functions.

take this example:

PHP

preg_match('/^[a-z]*$/','thisisallalpha');

C#

new Regex("^[a-z]*$").IsMatch('thisisallalpha');

can easily be done with

PHP

ctype_alpha('thisisallalpha');

C#

VFPToolkit.Strings.IsAlpha('thisisallalpha');

There are many other examples but you should get the point I'm trying to make.

What version of string comparison should you try and lean towards and why?

+1  A: 

If operation is run rarely (not more than 1 op/s) e.g. user creation form then I would prefer RegExp as it is more readable for me and easy to extend. Also you can keep regexps in a database or external configuration.

But if operation is run few to thousand times per second then I would search for alternative.

gertas
Strange, after years of trying to get used to RegExps, I still consider them write-only code.
peterchen
Yes, they are evil sometimes but it's hard to live without them.
gertas
A: 

They're both part of the language for a reason. IsAlpha is more expressive. For example, when an expression you're looking at is inherently alpha or not, and that has domain meaning, then use it.

But if it is, say, an input validation, and could possibly be changed to include underscores, dashes, etc., or if it is with other logic that requires regex, then I would use regex. This tends to be the majority of the time for me.

Mark Thomas
Thans for your reply, I know this, its mentioned within my original post, what my thoughts was, is for operations that can be handled by both methods, witch one would you swing for, and why?
RobertPitt
Edited to address your question better (I hope)
Mark Thomas
+4  A: 

Looks like this question arose from our small argument here, so i feel myself somehow obliged to respond.

php developers are being actively brainwashed about "performance", whereat many rumors and myths arise, including sheer stupid things like "double quotes are slower". Regexps being "slow" is one of these myths, unfortunately supported by the manual (see infamous comment on the preg_match page). The truth is that in most cases you don't care. Unless your code is repeated 10,000 times, you don't even notice a difference between string function and a regular expression. And if your code does repeat 10,000 times, you must be doing something wrong in any case, and you will gain performance by optimizing your logic, not by stripping down regular expressions.

As for readability, regexps are admittedly hard to read, however, the code that uses them is in most cases shorter, cleaner and simpler (compare yours and mine answers on the above link).

Another important concern is flexibility, especially in php, whose string library doesn't support unicode out of the box. In your concrete example, what happens when you decide to migrate your site to utf8? With ctype_alpha you're kinda out of luck, preg_match would require another pattern, but will keep working.

So, regexes are not slower, more readable and more flexible. Why on the earth should we avoid them?

stereofrog
Yes it did arise from that small "conversation", and i thought it would be best to get some other programmer's views on the matter before I jump in, Thanks for your view +1, but as I said, I do not avoid regular expressions, I just think that to save them few peta-seconds I would lean towards string functions, as as my application grows, every little helps.
RobertPitt
Erm _"if your code does repeat 10,000 times, you must be doing something wrong in any case"_ => I politely disagree that this _must_ be the case. There are certainly valid cases.
Wrikken
+1  A: 

Regular expressions actually lead to a performance gain (not that such microoptimizations are in any way sensible) when they can replace multiple atomic string comparisons. So typically around five strpos() checks it gets advisable to use a regular expression instead. Moreso for readability.

And here's another thought to round things up: PCRE can handle conditionals faster than the Zend kernel can handle IF bytecode.

Not all regular expressions are designed equal, though. If the complexetiy gets too high, regex recursion can kill its performance advantage. Therefore it's often reconsiderworthy to mix regex matching and regular PHP string functions. Right tool for the job and all.

mario
A: 

PHP itself recommends using string functions over regex functions when the match is straightforward. For example, from the preg_match manual page:

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.

Or from the str_replace manual page:

If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().

However, I find that people try to use the string functions to solve problems that would be better solved by regex. For instance, when trying to create a full-word string matcher, I have encountered people trying to use strpos($string, " $word ") (note the spaces), for the sake of "performance", without stopping to think about how spaces aren't the only way to delineate a word (think about how many string functions calls would be needed to fully replace preg_match('/\bword\b/', $string)).

My personal stance is to use string functions for matching static strings (ie. a match of a distinct sequence of characters where the match is always the same) and regular expressions for everything else.

Daniel Vandersluis
A: 

Agreed that PHP people tend to over-emphasise performance of one function over another. That doesn't mean the performance differences don't exists -- they definitely do -- but most PHP code (and indeed most code in general) has much worse bottlenecks than the choice of regex over string-comparison. To find out where your bottlenecks are, use xdebug's profiler. Fix the issues it comes up with before worrying about fine-tuning individual lines of code.

Spudley