views:

225

answers:

3

i need to test if any of the strings 'hello', 'i am', 'dumb' exist in the longer string called $ohreally, if even one of them exists my test is over, and i have the knowledge that neither of the others will occur if one of them has.

Under these conditions I am asking for your help on the most efficient way to write this search,

strpos() 3 times like this?

if (strpos ($ohreally, 'hello')){return false;}  
   else if (strpos ($ohreally, 'i am')){return false;}  
   else if (strpos ($ohreally, 'dumb')){return false;}  
   else {return true;}

or one preg_match?

if (preg_match('hello'||'i am'||'dumb', $ohreally)) {return false}   
   else {return true};

I know the preg_match code is wrong, i would really appreciate if someone could offer the correct version of it.

Thank You!


Answer

Please read what cletus said and the test middaparka did bellow. I also did a mirco time test, on various strings, long and short. with these results

IF, you know the probability of the string values occurring ORDER them from most probable to least. (I did not notice a presentable different in ordering the regex itself i.e. between /hello|i am|dumb/ or /i am|dumb|hello/.

On the other hand in sequential strpos the probability makes all the difference. For example if 'hello' happens 90%, 'i am' 7% and 'dumb' 3 percent of the time. you would like to organize your code to check for 'hello' first and exit the function as soon as possible.

my microtime tests show this.

for haystacks A, B, and C in which the needle is found respectively on the first, second, and third strpos() execution, the times are as follows,

strpos:
A: 0.00450 seconds // 1 strpos()
B: 0.00911 seconds // 2 strpos()
C: 0.00833 seconds // 3 strpos()
C: 0.01180 seconds // 4 strpos() added one extra

and for preg_match:
A: 0.01919 seconds // 1 preg_match()
B: 0.02252 seconds // 1 preg_match()
C: 0.01060 seconds // 1 preg_match()

as the numbers show, strpos is faster up to the 4rth execution, so i will be using it instead since i have only 3, sub-stings to check for : )

+5  A: 

The correct syntax is:

preg_match('/hello|i am|dumb/', $ohreally);

I doubt there's much in it either way but it wouldn't surprise me if the strpos() method is faster depending on the number of strings you're searching for. The performance of strpos() will degrade as the number of search terms increases. The regex probably will to but not as fast.

Obviously regular expressions are more powerful. For example if you wanted to match the word "dumb" but not "dumber" then that's easily done with:

preg_match('/\b(hello|i am|dumb)\b/', $ohreally);

which is a lot harder to do with strpos().

Note: technically \b is a zero-width word boundary. "Zero-width" means it doesn't consume any part of the input string and word boundary means it matches the start of the string, the end of the string, a transition from word (digits, letters or underscore) characters to non-word characters or a transition from non-word to word characters. Very useful.

Edit: it's also worth noting that your usage of strpos() is incorrect (but lots of people make this same mistake). Namely:

if (strpos ($ohreally, 'hello')) {
  ...
}

will not enter the condition block if the needle is at position 0 in the string. The correct usage is:

if (strpos ($ohreally, 'hello') !== false) {
  ...
}

because of type juggling. Otherwise 0 is converted to false.

cletus
thank you for this, I'm going to do a microtime() test on your correct code and report back with the results : )
Mohammad
thank you for reminding me of the strpos caveat, i had forgotten it! i'll have to look through my code and see if there are any fixes needed. the preg_match code you provided works fine! Thank you so much for all the information, it proved very useful :D I edited the answer above to reflect the test i did which proves your statement about strpos() degrading as the number of search terms goes up.
Mohammad
+2  A: 

Crazy idea, but why not test both 'n' thousand times in two separate loops, both surrounded by microtime(); and the associated debug output.

Based on the above code (with a few corrections) for 1,000 iterations, I get something like:

strpos test: 0.003315
preg_match test: 0.014241

As such, in this instance (with the limitations outlined by others) strpos indeed seems faster, albeit by a largely meaningless amount. (The joy of pointless micro-optimisation, etc.)

Never estimate what you can measure.

middaparka
Thank you so much, you are correct. strpos() is usually the winner! especially in my case which i actually know the occurrence probability of the needles in the string, by placing the most probable one first i gain a super advantage of lessening the times strpos() is actually called :] please read the update I have posted in the question that reflects my test results. And thank you!
Mohammad
+1  A: 

It depends on the number of strings you want to look for and the length of the string you are searching.

You'd need to experiment with a representative data set to find out which is true (repeat the operation, say 1000 times and measure the time delay).

BTW - I think the regex you are looking for is '(hello|i am|dumb)'

Also, your code is more verbose than it needs to be:

return strpos($ohreally, 'hello') || strpos($ohreally, 'i am') || strpos($ohreally, 'dumb');

or

return preg_match('(hello|i am|dumb)',$ohreally);

Also, by all the usual coding standards, there should not be a space between the function name and the bracket.

C.

symcbean
You have a couple of errors, namely the OP's same problem of checking `strpos()` results (see my answer) and you're not delimiting your regex in `preg_match()`.
cletus