ansaurus

Question

Performance: PHP Error Handling and Regex

Answer 1

+1 A:

If you really only want 0-9, and a-z (case insensitive) and the period:

//Remove characters all non-listed characters from string
$cleanString = preg_replace('/[^a-z0-9.]/i', '', $unCleanString);

if you also want to include the comma, underscore and colon:

$cleanString = preg_replace('/[^a-z0-9.,_:]/i', '', $unCleanString);

For the error reporting:
It is faster to switch error reporting off.

Another option is to try and write code that does not generate warnings and notices but instead does proper checking. So, if some notice/error does occur, log it and fix the code.
I think that in the end this is the winning strategy

Jacco 2009-09-28 10:43:08

Any chance the last ereg_replace can be done with preg replace?

Abs 2009-09-28 10:46:16

No need to escape the DOT inside a character class.

Bart Kiers 2009-09-28 10:47:57

You are right "It is faster to switch error reporting off". However, is it true that PHP will be slow down because it has to figure out those NOTICES itself?

Abs 2009-09-28 10:50:15

@Bart: never knew that :) thanks.

Jacco 2009-09-28 10:50:40

@Jacco: no problem. A character class can be seen as a separate little language inside regex including it's own meta-characters which are:'^', '-' and ']'. All other characters need no escaping in them.

Bart Kiers 2009-09-28 11:04:28

Answer 2

A:

The preg_ functions expect the regex to be delimited (like Perl regex-es):

preg_replace('/[a-zA-Z\d.]++/', '', $str);

Bart Kiers 2009-09-28 10:43:58

Ah I was missing that fact!

Abs 2009-09-28 10:51:24

Comparing your regex with soulmerge - which is faster or are both good and time difference is negligible? I asked soulmerge the same question.

Abs 2009-09-28 11:04:51

I suspect the time difference to be negligible. If you really want to know: test it!

Bart Kiers 2009-09-28 11:33:24

Answer 3

A:

//Remove characters. Anything apart from a-z(upper and lower case), 
numbers, periods
preg_replace('/[A-Za-z0-9.]/', '',$string);

Wbdvlpr 2009-09-28 10:44:27

Using a * will match (and replace) all empty strings in $string. Better use + (or ++ to make it possessive to increase performance).

Bart Kiers 2009-09-28 10:47:02

With the carat(^) this condition will remove the exact opposite to what you expect. Best make an edit.

scragar 2009-09-28 11:50:12

@scragar -- why dont you care to put your answer then?

Wbdvlpr 2009-09-28 16:46:34

Answer 4

+1 A:

1.) Turning error reporting off should increase performance, the part of the error reporting process that consumes the most time is either outputting the error message or calling custom error handlers (I don't know, haven't measured. This is my guess).

2.) PCRE regular expressions require you to delimit your RE, have a look at the docs. Besides, you RE looks a bit broken, I think it was meant to be something like this (replace anything that is not a letter, a number, comma, period, underscore or colon with the empty string):

preg_replace('/[^A-Za-z0-9,._:]/', '', $string);
# If you want to support characters in any language (like umlauts in german,
# for example - öäü), not just the letters a-z, you should use the unicode
# properties:
# http://php.net/manual/en/regexp.reference.unicode.php
preg_replace('/[^\PL\PN,._:]/', '', $string);

soulmerge 2009-09-28 10:48:36

Comparing your regex (first one) with Bart - which is faster or are both good and time difference is negligible? I asked Bart the same question.

Abs 2009-09-28 11:03:40

They are not the same, his regex does not capture the characters ',_:'

soulmerge 2009-09-28 11:24:26

Hmm, not very readable, lets try again: His regex does not capture comma, underscore and colon

soulmerge 2009-09-28 11:25:24

I have used yours and your explanations about other languages is very helpful. Thank you soulmerge. :)

Abs 2009-09-28 12:32:15

ansaurus

tags:

views:

answers:

Performance: PHP Error Handling and Regex

related questions