Transform all diacritics into their base character and then strip anything that is not a letter or a digit using Char.IsLetterOrDigit
.
Then replace all spaces by a single dash.
This is what we use in our software.
/// <summary>
/// Convert a name into a string that can be appended to a Uri.
/// </summary>
private static string EscapeName(string name)
{
if (!string.IsNullOrEmpty(name))
{
name = NormalizeString(name);
// Replaces all non-alphanumeric character by a space
StringBuilder builder = new StringBuilder();
for (int i = 0; i < name.Length; i++)
{
builder.Append(char.IsLetterOrDigit(name[i]) ? name[i] : ' ');
}
name = builder.ToString();
// Replace multiple spaces into a single dash
name = Regex.Replace(name, @"[ ]{1,}", @"-", RegexOptions.None);
}
return name;
}
/// <summary>
/// Strips the value from any non english character by replacing thoses with their english equivalent.
/// </summary>
/// <param name="value">The string to normalize.</param>
/// <returns>A string where all characters are part of the basic english ANSI encoding.</returns>
/// <seealso cref="http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net"/>
private static string NormalizeString(string value)
{
string normalizedFormD = value.Normalize(NormalizationForm.FormD);
StringBuilder builder = new StringBuilder();
for (int i = 0; i < normalizedFormD.Length; i++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(normalizedFormD[i]);
if (uc != UnicodeCategory.NonSpacingMark)
{
builder.Append(normalizedFormD[i]);
}
}
return builder.ToString().Normalize(NormalizationForm.FormC);
}
Concerning using those generated name as unique Id, I would vouch against. Use the generated name as a SEO helper, but not as a key resolver. If you look at how stackoverflow references their pages:
http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
^--ID ^--Unneeded name but helpful for bookmarks and SEO
You can find the ID there. These two URL point to the same page:
http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
http://stackoverflow.com/questions/249087/