tags:

views:

232

answers:

6

I want to auto-generate a readable URL from any natural text, like this:

Latest article: About German letters - Handling äöü and ß!

would ideally be changed to this

latest-article-about-german-letters-handling-aou-and-ss.html

It should work for all latin based languages and I want to avoid any escaping.

I guess this could be achieved by regular expressions, but perhaps there's already a standard function available in PHP/PEAR/PECL.

A: 

create an array with your special chars, loop through them using str_replace and recplace your values with your desired value.

Ben Fransen
A: 

You would definitely need to replace the special characters first. Then you could use preg_replace and do something like

$url = preg_replace("#[^a-zA-Z0-9_-]#", "_", $string);
Chris Gutierrez
+10  A: 

What you're looking for is slugify your text.

You can find snippets of code on the Internet such as this one that will do the trick:

/**
 * Modifies a string to remove al non ASCII characters and spaces.
 */
static public function slugify($text)
{
    // replace non letter or digits by -
    $text = preg_replace('~[^\\pL\d]+~u', '-', $text);

    // trim
    $text = trim($text, '-');

    // transliterate
    if (function_exists('iconv'))
    {
        $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
    }

    // lowercase
    $text = strtolower($text);

    // remove unwanted characters
    $text = preg_replace('~[^-\w]+~', '', $text);

    if (empty($text))
    {
        return 'n-a';
    }

    return $text;
}

From here.

Guillaume Flandre
@Guillaume - great find. I've inlined the code in case it ever disappears from snipplr.
Dominic Rodger
You're totally right, I didn't think about this, thanks.
Guillaume Flandre
note that on top of that, you should make a check to make sure the generated title is unique in your system. And if not, append a unique id of some kind. I mention this because it cost me a heavy headache not so long ago :)
pixeline
This doesn't seem to work right: preg_replace also removes all non-ASCII characters. When I keem them, iconv just returns $text until the first non-ASCII character. Is this a PHP configuration issue?
DR
The Symfony documentation adds a recommandation about iconv: make sure to save your php files with the UTF-8 encoding since it's the one used by iconv to do the transliteration. Otherwise that should work, I already used it without any problem.
Guillaume Flandre
+3  A: 

I don't think there is a function to do this, I recently created this though:

function fix_url($word) {
    /**
     * whilst the descriptor in the url will be for SEO     
     * purposes only, we need to ensure it doesn't break
     * the URI rules http://www.faqs.org/rfcs/rfc2396.html
     */  

    // convert to lower case
    $word=strtolower($word);

    // define illegal / replacement characters
    $illegal = array("ä","ö","ü","ß");
    $replace = array("a","o","u","ss");
    $word = str_replace($illegal, $replace, $word);

    // remove & for and
    $word=str_replace("&","and",$word);

    // remove a space for -
    $word=str_replace(" ","-",$word);

    // and replace all non alphanumeric characters or a dash
    $word=ereg_replace("[^A-Za-z0-9-]", "", $word);
    return $word;
}

I have included an example of replacing an illegal character with a safe one.

I have tested this code and it returns latest-article-about-german-letters---handling-aou-and-ss so obviously there are still some tweaks to make (see the ---), butIi'm sure this will be easy to adapt.

ILMV
A: 

I use the following to generate a filename from a string:

function format_filename($str)
{
    $str = preg_replace('/[^A-Za-z0-9- ]/', $seperator, $str);
    $str = str_replace($seperator, '', $str);
    $str = str_replace(' ', $seperator, $str);
    return strtolower($str);
}

It was quick and dirty and I'm sure some of the other nice people on here can improve that code snippet.

Martin Bean
+1  A: 

From some time I sucessfuly use utf8_to_ascii from PHP UTF8 library. Works for any UTF-8 text (non-latin included).

greg