ansaurus

Question

Answer 1

+3 A:

It seems you want to strip off diacritics and leave the base character. I'd recommend Ben Lings's solution here for this:

string input = "ŠĐĆŽ šđčćž";
string decomposed = input.Normalize(NormalizationForm.FormD);
char[] filtered = decomposed
    .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
    .ToArray();
string newString = new String(filtered);

Edit: Slight problem! It doesn't work for the Đ. The result is:

SĐCZ sđccz

Mark Byers 2010-04-02 13:18:48

I get following error: 'string' does not contain a definition for 'Normalise' and no extension method 'Normalise' accepting a first argument of type 'string' could be found (are you missing a using directive or an assembly reference?)

ile 2010-04-02 13:24:11

@ile: Apparently there was an error in the solution I copied this from. I have fixed it now. Unfortunately though this method fails for Đ, so either you will have to handle that case specially, or just do it the way you originally suggested.

Mark Byers 2010-04-02 13:25:55

I see... but this is very simple solution and I will use this and use special method to replace Đ and đ. Thanks!

ile 2010-04-02 13:30:08

Answer 2

+3 A:

Jon Skeet mentioned the following code on a newsgroup...

static string RemoveAccents (string input)
{
    string normalized = input.Normalize(NormalizationForm.FormKD);
    Encoding removal = Encoding.GetEncoding(Encoding.ASCII.CodePage,
                                            new EncoderReplacementFallback(""),
                                            new DecoderReplacementFallback(""));
    byte[] bytes = removal.GetBytes(normalized);
    return Encoding.ASCII.GetString(bytes);
}

EDIT

Maybe I am crazy, but I just ran the following...

Dim Input As String = "ŠĐĆŽ-šđčćž"
Dim Builder As New StringBuilder()

For Each Chr As Char In Input
    Builder.Append(Chr)
Next

Console.Write(Builder.ToString())

And the output was SDCZ-sdccz

Josh Stodola 2010-04-02 13:22:08

This removes the Đ completely.

Mark Byers 2010-04-02 13:27:06

@Mark You are right, but see my edit, it is somewhat unbelievable

Josh Stodola 2010-04-02 13:47:51

@Josh hmm I tried that VB.NET code locally and I get the original string.

Ahmad Mageed 2010-04-02 14:03:57

@Ahmad I bet it is somehow related to localization settings. I must say that I was daunted when it produced the desired output.

Josh Stodola 2010-04-02 14:40:10

Answer 3

A:

A dictionary would be a logical solution to this...

Dictionary<char, char> AccentEquivelants = new Dictionary<char, char>();
AccentEquivelants.Add('Š', 's');
//...add other equivelents

string inputstring = "";
StringBuilder FixedString = new StringBuilder(inputstring);
for (int i = 0; i < FixedString.Length; i++)
    if (AccentEquivelants.ContainsKey(FixedString[i]))
        FixedString[i] = AccentEquivelants[FixedString[i]];
return FixedString.ToString();

You need to use a StringBuilder when doing string operations like this because strings in C# are immutable, so changing a character at a time will create several string objects in memory, whereas StringBuilders are mutable and do not have this drawback.

DonaldRay 2010-04-02 13:40:41

But character arrays are not. Create a character array and modify the values in it.

Timothy Baldridge 2010-04-02 13:45:44

ansaurus

tags:

views:

answers:

String replace diacritics in C#

related questions