views:

135

answers:

2

Say I have this webpage:
http://ww.xyz.com/Product.aspx?CategoryId=1

If the name of CategoryId=1 is "Dogs" I would like to convert the URL into something like this:
http://ww.xyz.com/Products/Dogs

The problem is if the category name contains foreign (or invalid for a url) characters. If the name of CategoryId=2 is "Göra äldre", what should be the new url?

Logically it should be:
http://ww.xyz.com/Products/Göra äldre
but it will not work. Firstly because of the space (which I can easily replace by a dash for example) but what about the foreign characters? In Asp.net I could use the URLEncode function which would give something like this:
http://ww.xyz.com/Products/G%c3%b6ra+%c3%a4ldre
but I can't really say it's better than the original url (http://ww.xyz.com/Product.aspx?CategoryId=2)

Ideally I would like to generate this one but how can I can do this automatically (ie converting foreign characters to 'safe' url characters):
http://ww.xyz.com/Products/Gora-aldre

+1  A: 

Transliterate non-ASCII characters to ASCII, using something like this:

var str = "éåäöíØ";
var noApostrophes = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(str)); 

=> "eaaoiO"

(Source)

Sjoerd
What if some characters are not Cyrillic? I need a solution which will always work.
Anthony
Then you'll need to add more checks for different types of encoding. Unfortunately there's no magic wand here unless you use a library that does it all for you.
hollsk
Maybe the UnidecodeSharp library is what you are looking for: http://unidecode.codeplex.com/
Sjoerd
A: 

I've come up with the 2 following extension methods (asp.net / C#):

     public static string RemoveAccent(this string txt)
    {
        byte[] bytes = System.Text.Encoding.GetEncoding("Cyrillic").GetBytes(txt);
        return System.Text.Encoding.ASCII.GetString(bytes);
    }

    public static string Slugify(this string phrase)
    {
        string str = phrase.RemoveAccent().ToLower();
        str = System.Text.RegularExpressions.Regex.Replace(str, @"[^a-z0-9\s-]", ""); // Remove all non valid chars          
        str = System.Text.RegularExpressions.Regex.Replace(str, @"\s+", " ").Trim(); // convert multiple spaces into one space  
        str = System.Text.RegularExpressions.Regex.Replace(str, @"\s", "-"); // //Replace spaces by dashes
        return str;
    }
Anthony