views:

72

answers:

2

How can I use PHP to strip out all characters that are NOT alpha, numeric, space, or puncutation?

I've tried the following, but it strip punctuation.

preg_replace("/[^a-zA-Z0-9\s]/", "", $str);
+2  A: 

You're going to have to list the punctuation explicitly as there is no shorthand for that (eg \s is shorthand for white space characters).

preg_replace('/[^a-zA-Z0-9\s\-=+\|!@#$%^&*()`~\[\]{};:\'",<.>\/?]/', '', $str);
cletus
A: 
preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", $str);

Example:

php > echo preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", "⟺f✆oo☃. ba⟗r!");
foo. bar!

\p{P} matches all Unicode punctuation characters (see Unicode character properties). If you only want to allow specific punctuation, simply add them to the negated character class. E.g:

preg_replace("/[^a-zA-Z0-9\s.?!]/", "", $str);
Matthew Flaschen
so that would add: . ? ! correct
Tedd
The second would. The first allows all punctuation.
Matthew Flaschen
These seem to strip ALL characters :(
Tedd
I'm using your first example and this seem to strip all characters. What am I doing wrong?
Tedd
@Tedd, not sure. I posted a tested example. The [docs](http://www.php.net/manual/en/regexp.reference.unicode.php) mention a couple caveats. You have to use PHP after 4.4 or 5.1 (depending on branch), and UTF-8, and the PCRE library has to be compiled with `--enable-unicode-properties`
Matthew Flaschen