



For the hope-to-have-an-answer-in-30-seconds part of this question, I'm specifically looking for C#

But in the general case, what's the best way to strip punctuation in any language?

I should add: Ideally, the solutions won't require you to enumerate all the possible punctuation marks.

Related: Strip Punctuation in Python

+3  A: 

The most braindead simple way of doing it would be using string.replace

The other way I would imagine is a regex.replace and have your regular expression with all the appropriate punctuation marks in it.

+2  A: 

Assuming "best" means "simplest" I suggest using something like this:

String stripped = input.replaceAll("\\p{Punct}+", "");

This example is for Java, but all sufficiently modern Regex engines should support this (or something similar).

Edit: the Unicode-Aware version would be this:

String stripped = input.replaceAll("\\p{P}+", "");

The first version only looks at punctuation characters contained in ASCII.

Joachim Sauer
+8  A: 

new string(myCharCollection.Where(c => !char.IsPunctuation(c)));

Yup. It's powering the string operation I posted below.
Tom Ritter
+2  A: 

You can use the regex.replace method:

 replace(YourString, RegularExpressionWithPunctuationMarks, Empty String)

Since this returns a string, your method will look something like this:

 string s = Regex.Replace("Hello!?!?!?!", "[?!]", "");

You can replace "[?!]" with something more sophiticated if you want:


This should find any punctuation.

+1  A: 

Based off GWLlosa's idea, I was able to come up with the supremely ugly, but working:

string s = "cat!";
s = s.ToCharArray().ToList<char>()
      .Where<char>(x => !char.IsPunctuation(x))
      .Aggregate<char, string>(string.Empty, new Func<string, char, string>(
             delegate(string s, char c) { return s + c; }));
Tom Ritter
I know; right? I hobby of mine is committing sins against code in Linq. But please, by all means, make it better.
Tom Ritter
+1  A: 

Here's a slightly different approach using linq. I like AviewAnew's but this avoids the Aggregate

        string myStr = "Hello there..';,]';';., Get rid of Punction";

        var s = from ch in myStr
                where !Char.IsPunctuation(ch)
                select ch;

        var bytes = UnicodeEncoding.ASCII.GetBytes(s.ToArray());
        var stringResult = UnicodeEncoding.ASCII.GetString(bytes);
+4  A: 

Why not simply:

string s = "sxrdct?fvzguh,bij.";
var sb = new StringBuilder();

foreach (char c in s)
   if (!char.IsPunctuation(c))

s = sb.ToString();

The usage of RegEx is normally slower than simple char operations. And those LINQ operations look like overkill to me. And you can't use such code in .NET 2.0...

    using namespace std;

    int main(int a, char* b[]){
    string strOne = "H,e.l/l!o W#o@r^l&d!!!";
    int punct_count = 0;

cout<<"before : "<<strOne<<endl;
for(string::size_type ix = 0 ;ix < strOne.size();++ix)   
    cout<<"after : "<<strOne<<endl;
                  return 0;
+1  A: 

Fastest and easiest to read (IMHO):


to implement:

public static class StringExtension
    public static string StripPunctuation(this string s)
        var sb = new StringBuilder();
        foreach (char c in s)
            if (!char.IsPunctuation(c))
        return sb.ToString();

I tested several of the ideas posted here. Hades32's solution was the fastest (the stringbuilder with a foreach loop).

stringbuilder with foreach ( 1059 ms )
stringbuilder with foreach wrapped in extension ( 1056 ms )
stringbuilder with for loop ( 1061 ms )
string concat with foreach ( 2254 ms )
where with new string ( 1333 ms )
where with aggregate ( 2884 ms )
compiled regex ( 2481 ms )

This isn't a very realistic benchmark. Here is the code if you'd like to improve:

    public void MeasureStripPunctionationTest()
        Measure("stringbuilder with foreach", s =>
                                                      var sb = new StringBuilder();
                                                      foreach (char c in s)
                                                          if (!char.IsPunctuation(c))
                                                      return sb.ToString();

        Measure("stringbuilder with foreach wrapped in extension", s =>
                                                                           var sb = new StringBuilder();
                                                                           foreach (char c in s)
                                                                               if (!char.IsPunctuation(c))
                                                                           return sb.ToString();

        Measure("stringbuilder with for", s =>
                                                  var sb = new StringBuilder();
                                                  for (int i = 0; i < s.Length; i++)
                                                      if (!char.IsPunctuation(s[i]))
                                                  return sb.ToString();

        Measure("string concat with foreach", s =>
                                                      var result = "";
                                                      foreach (char c in s)
                                                          if (!char.IsPunctuation(c))
                                                              result += c;
                                                      return result;

        Measure("where with new string", s => new string(s.Where(item => !char.IsPunctuation(item)).ToArray()));

        Measure("where with aggregate", s => s.Where(item => !char.IsPunctuation(item))
                                                 .Aggregate(string.Empty, (result, c) => result + c));

        var stripRegex = new Regex(@"\p{P}+", RegexOptions.Compiled);
        Measure("compiled regex", s => stripRegex.Replace(s, ""));

    private void Measure(string name, Func<string, string> stripPunctation)
        using (new PerformanceTimer(name))
            var s = "a !@#$ short >{}*' string";
            for (int i = 0; i < 1000000; i++)
                var withoutPunctuation = stripPunctation(s);
interesting tidbit: the following are not punctuation: $^+|<>=