views:

132

answers:

2

I'm using this method to remove accents from my strings:

static string RemoveAccents(string input)
{
    string normalized = input.Normalize(NormalizationForm.FormKD);
    StringBuilder builder = new StringBuilder();
    foreach (char c in normalized)
    {
        if (char.GetUnicodeCategory(c) !=
        UnicodeCategory.NonSpacingMark)
        {
            builder.Append(c);
        }
    }
    return builder.ToString();
}

but this method leaves đ as đ and doesn't change it to d, even though d is its base char. you can try it with this input string "æøåáâăäĺćçčéęëěíîďđńňóôőöřůúűüýţ"

What's so special in letter đ?

+1  A: 

I have to admit that I'm not sure why this works but it sure seems to

var str = "æøåáâăäĺćçčéęëěíîďđńňóôőöřůúűüýţ";
var noApostrophes = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(str)); 

=> "aoaaaaalccceeeeiiddnnooooruuuuyt"

Jonas Elfström
i'd love to know why this works too!
Mladen Prajdic
+1  A: 

"D with stroke" (Wikipedia) is used in several languages, and appears to be considered a distinct letter in all of them -- and that is why it remains unchanged.

Martin B
Also, eth in Old English mutated into "th" in English, while in Norwegian it turned into "d". Beyond a superficial similarity to capital d, it's completely different.
Frank Shearar
yeah but the same applies to č or ć which is also a distinct letter.
Mladen Prajdic