Ah, slugification
function slugify($text)
{
// Swap out Non "Letters" with a -
$text = preg_replace('/[^\\pL\d]+/u', '-', $text);
// Trim out extra -'s
$text = trim($text, '-');
// Convert letters that we have left to the closest ASCII representation
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// Make text lowercase
$text = strtolower($text);
// Strip out anything we haven't been able to convert
$text = preg_replace('/[^-\w]+/', '', $text);
return $text;
}
This works fairly well, as it first uses the unicode properties of each character to determine if it's a letter (or \d against a number) - then it converts those that aren't to -'s - then it transliterates to ascii, does another replacement for anything else, and then cleans up after itself. (Fabrik's test returns "arvizturo-tukorfurogep")
I also tend to add in a list of stop words - so that those are removed from the slug. "the" "of" "or" "a", etc (but don't do it on length, or you strip out stuff like "php")