views:

232

answers:

4

Hello all,

I have two questions, one is wordy and one programmy!

1) I know that PHP reporting on notices causes performnace problems (takes time to report on these errors and figure out what sort of error it is) but is this the same case if error_reporting is turned off? I guess it still does slow down performance but not as much as displaying it to output? Is this true?

2) Could somebody also help me turn this:

//Remove characters. Anything apart from a-z(upper and lower case), numbers, periods [.]
$cleanstring = ereg_replace("[^A-Za-z0-9]^[,]^[.]^[_]^[:]", "", $critvalue);

In to something more efficient and making use of preg replace rather than ereg replace. I just tried replacing the function but I get a Unknown modifier '^'

Also, would be great to get some links on improving performance and tweak tips you guys use!

Thanks all

+1  A: 

If you really only want 0-9, and a-z (case insensitive) and the period:

//Remove characters all non-listed characters from string
$cleanString = preg_replace('/[^a-z0-9.]/i', '', $unCleanString);

if you also want to include the comma, underscore and colon:

$cleanString = preg_replace('/[^a-z0-9.,_:]/i', '', $unCleanString);

For the error reporting:
It is faster to switch error reporting off.

Another option is to try and write code that does not generate warnings and notices but instead does proper checking. So, if some notice/error does occur, log it and fix the code.
I think that in the end this is the winning strategy

Jacco
Any chance the last ereg_replace can be done with preg replace?
Abs
No need to escape the DOT inside a character class.
Bart Kiers
You are right "It is faster to switch error reporting off". However, is it true that PHP will be slow down because it has to figure out those NOTICES itself?
Abs
@Bart: never knew that :) thanks.
Jacco
@Jacco: no problem. A character class can be seen as a separate little language inside regex including it's own meta-characters which are:'^', '-' and ']'. All other characters need no escaping in them.
Bart Kiers
A: 

The preg_ functions expect the regex to be delimited (like Perl regex-es):

preg_replace('/[a-zA-Z\d.]++/', '', $str);
Bart Kiers
Ah I was missing that fact!
Abs
Comparing your regex with soulmerge - which is faster or are both good and time difference is negligible? I asked soulmerge the same question.
Abs
I suspect the time difference to be negligible. If you really want to know: test it!
Bart Kiers
A: 
//Remove characters. Anything apart from a-z(upper and lower case), 
numbers, periods
preg_replace('/[A-Za-z0-9.]/', '',$string);

Read more

Wbdvlpr
Using a * will match (and replace) all empty strings in $string. Better use + (or ++ to make it possessive to increase performance).
Bart Kiers
With the carat(^) this condition will remove the exact opposite to what you expect. Best make an edit.
scragar
@scragar -- why dont you care to put your answer then?
Wbdvlpr
+1  A: 

1.) Turning error reporting off should increase performance, the part of the error reporting process that consumes the most time is either outputting the error message or calling custom error handlers (I don't know, haven't measured. This is my guess).

2.) PCRE regular expressions require you to delimit your RE, have a look at the docs. Besides, you RE looks a bit broken, I think it was meant to be something like this (replace anything that is not a letter, a number, comma, period, underscore or colon with the empty string):

preg_replace('/[^A-Za-z0-9,._:]/', '', $string);
# If you want to support characters in any language (like umlauts in german,
# for example - öäü), not just the letters a-z, you should use the unicode
# properties:
# http://php.net/manual/en/regexp.reference.unicode.php
preg_replace('/[^\PL\PN,._:]/', '', $string);
soulmerge
Comparing your regex (first one) with Bart - which is faster or are both good and time difference is negligible? I asked Bart the same question.
Abs
They are not the same, his regex does not capture the characters ',_:'
soulmerge
Hmm, not very readable, lets try again: His regex does not capture comma, underscore and colon
soulmerge
I have used yours and your explanations about other languages is very helpful. Thank you soulmerge. :)
Abs