tags:

views:

128

answers:

3

I want a reg exp for generating SEO-friendly URLs, so things like:

My product name

becomes

My_product_name

This is a long,long,long!!sentence

becomes

This_is_a_long_long_long_sentence

Basically so all non-alphanumeric chars are removed and replaced with underscores.

Any ideas?

+4  A: 

preg_replace('/[^a-zA-Z0-9]+/', '_', $sentence)

Basically it looks for any sequence of non-alphanumeric characters and replaces it with a single '_'. This way, you also avoid having two consecutive _'s in your output.

If it's for URLs, you probably also want them to be lower-case only:

preg_replace('/[^a-z0-9]+/', '_', strtolower($sentence))

Wim
If this replaces all occurrences of non-alphanumeric characters in $sentence is the + really needed?
Murali VP
If you remove the +, each occurance of a single non-alphanumeric character will be replaced by `_`. If your original string has two non-alphanumeric characters following each other, the resulting string will contain `__` which you probably don't want
Wim
Note that with this answer you may still have a `_` at the beginning and/or end of the string, so you're best to add the `$a = trim($a, '_');` also (+1 to stereofrog for that one)
Wim
There's one caveat here. This only works on ASCII text. For any word characters not representable by A-Z you probably want to still keep them but URL encode them.
Epsilon Prime
Any examples of non ascii characters?
Click Upvote
A: 
$str = preg_replace("`[^a-z\d]+`i", "_", $str);
Justin Johnson
+4  A: 
 $a = preg_replace("/[^A-Za-z0-9]+/", "_", $str);

or /\W+/ if you want to keep everything that is considered a "letter" in the current locale

after replacement it may be also neccessary to stip leading and trailing underscores

 $a = trim($a, '_');
stereofrog
Or use a unicode property such as \pL or \pN:http://www.php.net/manual/en/regexp.reference.unicode.php
Epsilon Prime