tags:

views:

163

answers:

4

Hello guys,

I'm using this function on my website to transform user input into acceptable URL:

function urlize($url) { 
$search = array('/[^a-z0-9]/', '/--+/', '/^-+/', '/-+$/' ); 
$replace = array( '-', '-', '', ''); 
return preg_replace($search, $replace, utf2ascii($url)); 
}     
function utf2ascii($string) { 
$iso88591  = "\\xE0\\xE1\\xE2\\xE3\\xE4\\xE5\\xE6\\xE7"; 
$iso88591 .= "\\xE8\\xE9\\xEA\\xEB\\xEC\\xED\\xEE\\xEF"; 
$iso88591 .= "\\xF0\\xF1\\xF2\\xF3\\xF4\\xF5\\xF6\\xF7"; 
$iso88591 .= "\\xF8\\xF9\\xFA\\xFB\\xFC\\xFD\\xFE\\xFF"; 
$ascii = "aaaaaaaceeeeiiiidnooooooouuuuyyy"; 
return strtr(mb_strtolower(utf8_decode($string), 'ISO-8859-1'),$iso88591,$ascii); 
}

I'm having a problem with it though, with numbers. For some reason when I try:

echo urlize("test 23342");

I get "test-eiioe". Why is that and how can I fix it?

Thank you very much!

A: 

Your utf2ascii function is wrong, that's the one turning test 23342 into test eiioe.

Why don't you use iconv to do the conversion from UTF-8 to ISO-8859-1? ie. use iconv("UTF-8", "ISO-8859-1//TRANSLIT", $url);

wimvds
Actually I don't know much about what these functions do. I got this code on Google and thought it would be ok. Your modification seems to be working though, thanks! Do you know what is best between yours and Kerry's version?
Maxime
Kerry's version will not work with accented characters, so if you want to support foreign languages with accents you'll be better of using iconv to convert these into legible characters first (though it's not 100% fool proof - check the comments on the iconv page).
wimvds
And now that I mention accented characters, since ISO-8859-1 (aka Latin1) also contains accented characters you'd be better of using ASCII, so use `iconv("UTF-8", "ASCII//TRANSLIT", $url);`. That will be better for slugs.
wimvds
+1  A: 

Hey, it looks like you are trying to create a slug. If so, this is the function I use/suggest:

function slug( $string ) {
    return strtolower( preg_replace( array( '/[^-a-zA-Z0-9\s]/', '/[\s]/' ), array( '', '-' ), $string ) );
}
Kerry
What's the difference between mine and this?
Maxime
Yours attempts to do a utf conversion, and I think that's where it's blowing up.Mine ONLY turns spaces into dashes, everything else (besides spaces and existing dashes) are removed. This means any special characters are also removed. If you put a "hell's angels" into yours, it would put "hell-s-angeles". The rest of yours is more comprehensive.
Kerry
+2  A: 

The problem is in your utf2ascii. I suggest you to use iconv() function instead.

iconv("UTF-8", "ISO-8859-1//IGNORE", $string);

The //IGNORE part in the output encoding means to ignore any character it can't translate. The bad news is you lose all accented characters. To keep them, you can use //TRANSLIT.

Then, you can use strtolower and some regexp to eliminate non-alphanumeric characters (or to replace them with -).

If you want to encode any data, there is also urlencode(), but this won't make you nice links.

Krab
How about that? function urlize($url) { $search = array('/[^a-z0-9]/', '/--+/', '/^-+/', '/-+$/' ); $replace = array( '-', '-', '', ''); return preg_replace($search, $replace, strtolower(iconv("UTF-8", "ISO-8859-1//TRANSLIT", $url))); }PS: Thanks for the speed and quality of your replies!
Maxime
I think that will be ok, test it with your input and see for yourself.
Krab
A: 

What's wrong with urlencode()?

symcbean