views:

896

answers:

5

I have to check the buffer input to a PHP socket server as fast as possible. To do so, I need to know if the input message $buffer contains any other character(s) than the following: a-z, A-Z, 0-9, #, -, . and $

I'm currently using the following ereg function, but wonder if there are ways to optimize the speed. Should I maybe use a different function, or a different regex?

if (ereg("[A-Za-z0-9]\.\#\-\$", $buffer) === false)
{
    echo "buffer only contains valid characters: a-z, A-Z, 0-9, #, -, ., $";
}
+1  A: 

preg_match is both faster and more powerful than ereg:

if(preg_match('/^[^a-z0-9\.#\-\$]*$/i', $sString) > 0) //check if (doesn't contain illegal characters) is true
{
  //everything's fine: $sString does NOT contain any illegal characters
}

or turn it around:

if(preg_match('/[a-z0-9\.#\-\$]/i', $sString) === 0) //check if (contains illegal character) is false
{
  //everything's fine: $sString does NOT contain any illegal characters
}
Douwe Maan
the string "0test1#test1#a1.0.000$" returns false for both your functions, should I inverse their returns?
Tom
Your regular expressions are wrong. `/^[^a-z0-9\.#\-\$]*$/` means *only other characters than `[a-z0-9\.#\-\$]`* and `/[a-z0-9\.#\-\$]/` means *at least one character of `[a-z0-9\.#\-\$]`*.
Gumbo
It wasn't completely clear to me if you wanted ONLY the specified characters, or anything BUT the specified characters.
Douwe Maan
Gumbo;If the OP wanted everything BUT those characters, both my functions work correctly as far as I can see. It just wasn't entirely clear to me what he wanted...
Douwe Maan
Tom; If you ONLY want the specified characters, you could use my first function, but remove the second ^ sign, after [
Douwe Maan
I simply want a function which returns false when the given string contains any characters other than the given characters a-z, A-Z, 0-9, #, -, . and $
Tom
Ok, then I misunderstood you at first. Just remove second ^ after the [ in the first function, and you should be ready to go.
Douwe Maan
+1  A: 

Use preg isntead, its faster, and ereg has been discontinued.

Galen
+3  A: 

The preg family of functions is quite a bit faster than ereg. To test for invalid characters, try something like:

if (preg_match('/[^a-z0-9.#$-]/i', $buffer)) print "Invalid characters found";
BipedalShark
Your regular expression is missing the literal `-` since `#-$` is describing a range of characters.
Gumbo
Aaaaaaand fixed.
BipedalShark
+2  A: 

You'll want to shift over to using preg instead of ereg. The ereg family of functions have been depreciated, and (since php 5.3) using them will throw up a PHP warning, and they'll be removed from teh language soon. Also, it's been anecdotal wisdom that the preg functions are, in general, faster than ereg.

As for speed, based on my experience and the codebases I've seen in my career, optimizing this kind of string performance would be premature at this point. Wrap the comparision in some logical function or method

//pseudo code based on OP 
function isValidForMyNeeds($buffer)
{
    if (ereg("[A-Za-z0-9]\.\#\-\$", $buffer) === false)
    {
        echo "buffer only contains valid characters: a-z, A-Z, 0-9, #, -, ., $";
    }
}

and then when/if you determine this is a performance problem you can apply any needed optimization in one place.

Alan Storm
why wait for problems to arise when they can be addressed now without too much hassle? there's not that many functions to do it I believe so it shouldn't be too much of a hassle, right?
Tom
Your regular expression is wrong and `ereg` does always return an integer.
Gumbo
@gumbo: the code sample was meant to be more illustrative of the concept of a wrapper function than is was fixing the particular regular expression@tom true enough, and my post is just an optinion on the subject. but this kind of optimization is often endless. For example, right now you're waiting on an answer to this question when you could be moving on and solving another problem in your app. Also, performance of string comparisions in PHP is hugely dependent on the input variables.
Alan Storm
Again, the point of the post wasn't the regular expression, which was simply copied from the OP.
Alan Storm
Never mind me, I completely overlooked the fact you already mentioned the deprecation (and removal) of the ereg-functions! Sorry about that.
Bart Kiers
+3  A: 

Try this function:

function isValid($str) {
    return !preg_match('/[^A-Za-z0-9.#\\-$]/', $str);
}

[^A-Za-z0-9.#\-$] describes any character that is invalid. If preg_match finds a match (an invalid character), it will return 1 and 0 otherwise. Furthermore !1 is false and !0 is true. Thus isValid returns false if an invalid character is found and true otherwise.

Gumbo