views:

846

answers:

7

I'm writing PHP code to parse a string. It needs to be as fast as possible, so are regular expressions the way to go? I have a hunch that PHP string functions are more expensive, but it's just a guess. What's the truth?

Here's specifically what I need to do with the string:

Grab the first half (based on the third location of a substring "000000") and compare its hash to the next 20 bytes, throwing away anything left.

Parse the 9th byte through the next "000000" as one piece of data. Then grab the next 19 bytes after that, and split that into 8 (toss 1) and 8. Then I do some other stuff that converts those two 8 byte strings into dates.

So that's the kind of thing I need to do.

A: 

Native string functions are way faster. The benefit of regexp is that you can do pretty much anything with them.

Joonas Pulakka
+2  A: 

I believe there is a threshold from which a regular expression is faster than a bunch of PHP string function calls. Anyway, depends a lot on what you're doing. You have to find out the balance.

Now that you edited your question. I'd use string functions for what you're trying to accomplish. strpos() and substr() is what comes to mind at a first glance.

Ionuț G. Stan
+2  A: 

I think if you want highest performance, you should avoid regex as it helps to minimize effort, but won't have the best performance as you can almost always adjust code using string routines to a specific problem and gain a big performance boost of it. But for simple parsing routines that can't be optimized much, you can still use regex as it won't make a big difference there.

EDIT: For this specific problem you posted I'd favorize string operations, but only because I wouldn't know how to do it in regex. This seems to be pretty straight-forward, except for the hash, so I think regex/string functions won't make a big difference.

schnaader
+5  A: 

It depends on your case: if you're trying to do something fairly basic (eg: search for a string, replace a substring with something else), then the regular string functions are the way to go. If you want to do something more complicated (eg: search for IP addresses), then the Regex functions are definitely a better choice.

I haven't profiled regexes so I can't say that they'll be faster at runtime, but I can tell you that the extra time spent hacking together the equivalent using the basic functions wouldn't be worth it.


Edit with the new information in the OP:

It sounds as though you actually need to do a number of small string operations here. Since each one individually is quite basic, and I doubt you'd be able to do all those steps (or even a couple of those steps) at one time using a regex, I'd go with the basic functions:

Grab the first half (based on the third location of a substring "000000") and compare its hash to the next 20 bytes, throwing away anything left.

Use: strpos() and substr()
Or : /$(.*?0{6}.*?0{6}.*?)0{6}/

Then grab the next 19 bytes after that, and split that into 8 (toss 1) and 8.

Use: substr() - (I assume you mean 17 bytes here -- 8 + 1 + 8)

$part1 = substr($myStr, $currPos, 8);
$part2 = substr($myStr, $currPos + 9, 8);
nickf
Regexp are surprisingly efficient. You shouldn't generally be afraid of using them as the default tool.
troelskn
A: 

Depends on your needs. Most regular expression operations are faster than one would think and can even outperform builtin string functions in certain trivial operations. Note that I have the preg library in mind, not the builtin regex library, which is quite slow.

soulmerge
+2  A: 

If what you're doing is at all reasonable to do using string functions, you should use them. Like, if you're determining whether a constant string 'abc' occurs in $value, you definitely want to check strpos($value, 'abc') !== false, not preg_match('/abc/', $value). If you find yourself doing a lot of string reshuffling and transformations in order to accomplish what you would've with a regex, though, you're almost certainly going to wind up destroying both performance and maintainability.

When concerned about speed, though, when it comes down to it, don't think about it, clock it. The 'time' command is your friend.

chaos
+1  A: 

In general, string functions are faster and regex functions are more flexible.

As with anything else, your results may vary, the only way to know for sure is to try it both ways and benchmark.

Hugh Bothwell