views:

46

answers:

4

How can I make the following regular expression ignore all whitespaces?

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z]", "", $_REQUEST["bar"]);

Input: Ingeniería Eléctrica'*;<42

Current Output: IngenieríaEléctrica

Desired Output: Ingeniería Eléctrica

I tried adding /s \s\s* \s+ /\s+/ \/s /t /r among others and they all failed.

Objective: A regex that will accept only strings with upper or lower case characters with or without (spanish) accents.

Thank you !

A: 

I believe this should work

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z ]", "", $_REQUEST["bar"]);
marvin
ereg_replace() is deprecated, you really should be recommending a switch to preg_replace()
Mark Baker
So... simple... ~.~ Thank you !
@mark yep, you are right, but I just wanted to correct the mistake without doing anything else
marvin
+2  A: 

I see no reason as to why adding \s to that regex would not work. \s should match all whitespace characters.

$foo = preg_replace("/[^áéíóúÁÉÍÓÚñÑa-zA-Z\s]/", "", $_REQUEST["bar"]);
Chris
bleh, been a while since i've used PHP. Thanks for the heads-up, have updated the answer :)
Chris
No problem. +1 ` `
Bart Kiers
A: 

ereg_replace uses POSIX Extended Regular Expressions and there, POSIX bracket expressions are used.

Now the important thing to know is that inside bracket expressions, \ is not a meta-character and therefore \s won't work.

But you can use the POSIX character class [:space:] inside the POSIX bracket expression to achieve the same effect:

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z[:space:]]", "", $_REQUEST["bar"]);

You see, it is different from the, I think, better known Perl syntax and as the POSIX regular expression functions are deprecated in PHP 5.3 you really should go with the Perl compatible ones.

Felix Kling
A: 

All the answers so far fail to point out that your method to match the accentuated characters is a hack and it's incomplete – for instance, no grave accents are matched.

The best way is to use the mbstring extension:

mb_regex_encoding("UTF-8"); //or whatever encoding you're using
var_dump(mb_ereg_replace("[^\\w\\s]|[0-9]", "", "Ingeniería Eléctrica'*;<42", "z"));

gives

string(22) "Ingeniería Eléctrica"
Artefacto