views:

181

answers:

6

Is there a way to remove every special character from a string like:

"\r\n               1802 S St Nw<br>\r\n                    Washington, DC 20009"

And to just write it like:

"1802 S St Nw, Washington, DC 20009"
+1  A: 

two ways, you can use RegEx, or you can use String.Replace(...)

Muad'Dib
can you write a code for regex?
Umair Ashraf
yes I can, can you ;)
Muad'Dib
hahahahaha, I liked it
Umair Ashraf
A: 

Use the Regex.Replace() method, specifying all of the characters you want to remove as the pattern to match.

Bernard
Actually there is a little more structure than "all of the characters you want to remove"
Henk Holterman
There sure is Henk.
Bernard
A: 

You can use the C# Trim() method, look here:

http://msdn.microsoft.com/de-de/library/d4tt83f9%28VS.80%29.aspx

elsni
@Henk: It's not the whole answer, but it's a start. And it's also a step you happened to miss.
Steven Sudit
+5  A: 

To remove special characters:

public static string ClearSpecialChars(this string input)
{
    foreach (var ch in new[] { "\r", "\n", "<br>", etc })
    {
        input = input.Replace(ch, String.Empty);
    }
    return input;
}

To replace all double space with single space:

public static string ClearDoubleSpaces(this string input)
{
    while (input.Contains("  ")) // double
    {
        input = input.Replace("  ", " "); // with single
    }
    return input;
}

You also may split both methods into a single one:

public static string Clear(this string input)
{
    return input
        .ClearSpecialChars()
        .ClearDoubleSpaces()
        .Trim();
}
abatishchev
the question is, it won't remove white spaces good. it can remove whitespace but between words there should be one whitespace remained except others
Umair Ashraf
@Umair: See my updates post.
abatishchev
What about `"<br \t />"`, `"<br\r\n/>"`, `"<br \r\n />"`, `"<br clear=\"both\"/>"` etc.?
dtb
@dtb: Enumerate all items to replace. Or use RegEx. But what about don't parse (X)HTML with RegEx? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
abatishchev
@Henk: Agree. So we have a dilemma - use looping or use RegEx. I prefer the first, the reason is under the link below :)
abatishchev
@Henk: Link above, indeed
abatishchev
It's not pretty, but it'll work and it'll be maintainable. I would not worry about the performance so long as this is only used to clean up what users enter, not to scrub large amounts of existing data.
Steven Sudit
@Steven: I would consider maintainability a reason _not_ to use this. All the Replace variations are declarative.
Henk Holterman
@Henk: Indeed they are, which makes them clearer to most people than RegExp.
Steven Sudit
@Henk: Look at the first code block. It explicitly lists each forbidden string, in an array. This is at least as declarative as RegExp code that specifies these strings, and it's a one-to-one declaration with no encoding. This makes it clearer than RegExp, and that's ultimately what matters here. I know you think of RegExp as declarative in that you don't tell it how to do its job, just what to do, but that's not the only possible meaning. And is's not what matters: what matters is that RegExp just isn't clear.
Steven Sudit
@Steven: That first block has a lot of trouble expressing "any occurrence of 2 or more spaces". Or "Only space after \r\n" or ... And again: It is horribly inefficient.
Henk Holterman
@Henk: While I certainly can't claim that abatishchev's code is fully optimized, I'm not sure I follow your example. The first bock doesn't even tackle the issue of double spaces; the second one does. If we wanted it to be faster, we could do a single pass copy in a StringBuilder.
Steven Sudit
A: 
System.Text.RegularExpressions.Regex.Replace("\"\\r\\n                                                            1802 S St Nw<br>\\r\\n                                                            Washington, DC 20009\"", 
 @"(<br>)*?\\r\\n\s+", "");
Ruel
A: 

Maybe something like this, using ASCII int values. Assumes all html tags will be closed.

public static class StringExtensions
{
    public static string Clean(this string str)
    {   
        string[] split = str.Split(' ');

        List<string> strings = new List<string>();
        foreach (string splitStr in split)
        { 
            if (splitStr.Length > 0)
            {
                StringBuilder sb = new StringBuilder();
                bool tagOpened = false;

                foreach (char c in splitStr)
                {
                    int iC = (int)c;
                    if (iC > 32)
                    {
                        if (iC == 60)
                            tagOpened = true;

                        if (!tagOpened)
                               sb.Append(c);

                        if (iC == 62)
                            tagOpened = false;
                    }
                }

                string result = sb.ToString();   

                if (result.Length > 0)
                    strings.Add(result);
            }
        }

        return string.Join(" ", strings.ToArray());
    }
}
mdm20