views:

469

answers:

3

I'd like to use this method to create user-friendly URL. Because my site is in Croatian, there are characters that I wouldn't like to strip but replace them with another. Fore example, this string:
ŠĐĆŽ šđčćž
needs to be: sdccz-sdccz

So, I would like to make two arrays, one that will contain characters that are to be replaced and other array with replacement characters:

string[] character = { "Š", "Đ", "Č", "Ć", "Ž", "š", "đ", "č", "ć", "ž" };
string[] characterReplace = { "s", "d", "c", "c", "z", "s", "d", "c", "c", "z" };

Finally, this two arrays should be use in some method that will take string, find matches and replace them. In php I used preg_replace function to deal with this. In C# this doesn't work:

s = Regex.Replace(s, character, characterReplace);


Would appreciate if someone could help. Thanks

+3  A: 

It seems you want to strip off diacritics and leave the base character. I'd recommend Ben Lings's solution here for this:

string input = "ŠĐĆŽ šđčćž";
string decomposed = input.Normalize(NormalizationForm.FormD);
char[] filtered = decomposed
    .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
    .ToArray();
string newString = new String(filtered);

Edit: Slight problem! It doesn't work for the Đ. The result is:

SĐCZ sđccz
Mark Byers
I get following error: 'string' does not contain a definition for 'Normalise' and no extension method 'Normalise' accepting a first argument of type 'string' could be found (are you missing a using directive or an assembly reference?)
ile
@ile: Apparently there was an error in the solution I copied this from. I have fixed it now. Unfortunately though this method fails for Đ, so either you will have to handle that case specially, or just do it the way you originally suggested.
Mark Byers
I see... but this is very simple solution and I will use this and use special method to replace Đ and đ. Thanks!
ile
+3  A: 

Jon Skeet mentioned the following code on a newsgroup...

static string RemoveAccents (string input)
{
    string normalized = input.Normalize(NormalizationForm.FormKD);
    Encoding removal = Encoding.GetEncoding(Encoding.ASCII.CodePage,
                                            new EncoderReplacementFallback(""),
                                            new DecoderReplacementFallback(""));
    byte[] bytes = removal.GetBytes(normalized);
    return Encoding.ASCII.GetString(bytes);
}

EDIT

Maybe I am crazy, but I just ran the following...

Dim Input As String = "ŠĐĆŽ-šđčćž"
Dim Builder As New StringBuilder()

For Each Chr As Char In Input
    Builder.Append(Chr)
Next

Console.Write(Builder.ToString())

And the output was SDCZ-sdccz

Josh Stodola
This removes the Đ completely.
Mark Byers
@Mark You are right, but see my edit, it is somewhat unbelievable
Josh Stodola
@Josh hmm I tried that VB.NET code locally and I get the original string.
Ahmad Mageed
@Ahmad I bet it is somehow related to localization settings. I must say that I was daunted when it produced the desired output.
Josh Stodola
A: 

A dictionary would be a logical solution to this...

Dictionary<char, char> AccentEquivelants = new Dictionary<char, char>();
AccentEquivelants.Add('Š', 's');
//...add other equivelents

string inputstring = "";
StringBuilder FixedString = new StringBuilder(inputstring);
for (int i = 0; i < FixedString.Length; i++)
    if (AccentEquivelants.ContainsKey(FixedString[i]))
        FixedString[i] = AccentEquivelants[FixedString[i]];
return FixedString.ToString();

You need to use a StringBuilder when doing string operations like this because strings in C# are immutable, so changing a character at a time will create several string objects in memory, whereas StringBuilders are mutable and do not have this drawback.

DonaldRay
But character arrays are not. Create a character array and modify the values in it.
Timothy Baldridge