tags:

views:

133

answers:

6

Is there any built-in method to remove similar characters in a string?
Examples:

aaaabbbccc ->  abc
aabbccaa -> abc

Thanks

A: 

I would have thought you want to look at using Regular Expressions. For C# .NET this is a useful site...

http://www.regular-expressions.info/dotnet.html

MrEdmundo
A: 

no.

But can easly be made in a loop, remember you need to build a new string you cannot edit a char in a existing string ( you can do string.remove , but very likely gonna be slow and mess up your loop ).

basicly:

for(int i=0;i<MyText.Length;i++)
{
   if(i == 0)
     contiune;

   if(Text[i] == Text[i - 1])
       // Do something, both chars are the same
}
EKS
+1  A: 

Use the Regex class:

Regex.Replace( "aaabbcc", @"(\w)\1+", "$1" )

will result in

abc

For more infos look here.

EDIT:
Since you I edited your question:

Regex.Replace( "acaabbccbaa", @"(\w)(?<=\1.+)", "" )

will result in

acb

This pattern uses a negative lookbehind to identify doubled chars and replaces them by ""

tanascius
+6  A: 

You could use a HashSet and build an extension method for this:

    static string RemoveDuplicateChars(this string s)
    {
        HashSet<char> set = new HashSet<char>();
        StringBuilder sb = new StringBuilder(s.Length);

        foreach (var c in s)
        {
            if (set.Add(c))
            {
                sb.Append(c);
            }
        }

        return sb.ToString();
    }

or using Enumerable.Distinct, simply:

Console.WriteLine(new string("aaabbbccaddcacc".Distinct().ToArray()));
bruno conde
+6  A: 

Does something like this solve your problem?

string distinct = new string("aaaabbbccc".Distinct().ToArray());

It's a little ugly, but you could wrap it into an extension method:

public static string UniqueChars(this string original)
{
    return new string(original.Distinct().ToArray());
}

Hope this helps.

Sean Devlin
If the source string is "aaaabbbcccaa" then this method will return "abc", not "abca". It's not clear which outcome the OP expects in that situation.
LukeH
By the way, you could shorten your code down to something like this: `string distinct = new string("aaaabbbccc".Distinct().ToArray());`
LukeH
That's true. None of the answers to this question are valid without clarification.
Sean Devlin
Good point, I'll amend it.
Sean Devlin
Luke, the OP actually wants `abc` in that case.
Joey
A: 

Since you specifically asked about removing "similar" characters, you may want to try something like this:

using System.Globalization;
....
        private string RemoveDuplicates(string text)
        {
            StringBuilder result = new StringBuilder();
            string previousTextElement = string.Empty;
            TextElementEnumerator textElementEnumerator = StringInfo.GetTextElementEnumerator(text);
            textElementEnumerator.Reset();
            while (textElementEnumerator.MoveNext())
            {
                string textElement = (string)textElementEnumerator.Current;
                if (string.Compare(previousTextElement, textElement, CultureInfo.InvariantCulture,
                    CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace |
                    CompareOptions.IgnoreWidth) != 0)
                {
                    result.Append(textElement);
                    previousTextElement = textElement;
                }
            }
            return result.ToString();
        }
Jeffrey L Whitledge