tags:

views:

1333

answers:

4

Hi, I'm pretty new to PHP, and I noticed there are many different ways of handling regular expressions.

This is what I'm currently using:

$replace = array(" ",".",",","'","@");
$newString = str_replace($replace,"_",$join);

$join = "the original string i'm parsing through";

I want to remove everything which isn't a-z, A-Z, or 0-9. I'm looking for a reverse function of the above. A pseudocode way to write it would be

'If characters in $join are not equal to a-z,A-Z,0-9 then change characters in $join to "_"'

+3  A: 

The regular expression for anything which isn't a-z, A-Z, 0-9 is:

preg_replace('/[^a-zA-Z0-9]/', "_", $join);

This is known as a Negated Character Class

Gavin Miller
+6  A: 
$newString = preg_replace('/[^a-z0-9]/i', '_', $join);

This should do the trick.

antennen
Hi antennen, thanks for the reply! is this case sensitive, will it except capitals? Thanks, Ben.
Ben McRae
That's what the 'i' at the end is for - case insensitive.
ceejayoz
note that this regex will replace consecutive occurrences of non-alphanumeric characters with a single _. Thus '@@@' would be replaced with '_' not '___'. Remove the + if you don't want this behavior.
Mark
Good thing you pointed that out, I normally throw away characters using the same method. The plus is just old habit. Edited since it didn't replicate OP's stated behavoir.
antennen
Thanks mark, The addition character is actually quite useful for what i am trying to achieve :)
Ben McRae
Thanks antennen, mark! if i wanted to allow certain characters along with the a-z0-9 eg. a backwards slash, how would i do this? sorry to ask a question in the comments section.
Ben McRae
A backward slash is a bit special. IIRC it'd be '/[^a-z0-9\\\\]/i'
antennen
+4  A: 

I am not giving you the answer but this tutorial is well worth its 10 minutes.

Link to Regular Expressions in PHP

CodeToGlory
Thanks CodeToGlory, i will be sure to check this out :)
Ben McRae
I was darn sure that people will down vote this without checking the tutorial and the intention behind helping that op understand reg expressions.
CodeToGlory
+1 to offset unnecessary downvoting. nothing wrong with this answer
Mark
+1  A: 

The easiest way is this:

preg_replace('/\W/', '_', $join);

\W is the non-word character group. A word character is a-z, A-Z, 0-9, and . \W matches everything not previously mentioned.

Edit: preg uses Perl's regular expressions, documented in the perlman perlre document.

*Edit 2: This assumes a C or one of the English locales. Other locales may have accented letters in the word character class. The Unicode locales will only consider characters below code point 128 to be characters.

R. Bemrose
Due to localization it might contain other characters than a-z though.
antennen
Actually, that's a good point. I'm not sure how PCRE or PHP particularly handles that. I'll see if I can find any docs about it.
R. Bemrose
I found this: http://www.php.net/manual/en/regexp.reference.phpScroll down to \W.
antennen