views:

292

answers:

5

I have a list of 400 strings that all end in "_GONOGO" or "_ALLOC". When the application starts up, I need to strip off the "_GONOGO" or "_ALLOC" from every one of these strings.

I tried this: 'string blah = Regex.Replace(string, "(_GONOGO|_ALLOC)", ""));'

but it is MUCH slower than a simple conditional statement like this:

if (string.Contains("_GONOGO"))
          // use Substring
else if (string.Contains("_ALLOC"))
          // use Substring w/different index

I'm new to regular expressions, so I'm hoping that someone has a better solution or I am doing something horribly wrong. It's not a big deal, but it would be nice to turn this 4 line conditional into one simple regex line.

+7  A: 

While it isn't RegEx, you could do

string blah = string.Replace("_GONOGO", "").Replace("_ALLOC", "");

RegEx is great for complex expressions, but the overhead can sometimes be overkill for very simple operations like this.

Adam Robinson
Thank you, this is just fine - regex wasn't a requirement I just wanted it down to one line.
alexD
+3  A: 

Regex replacements may work faster if you compile the regex first. As in:

Regex exp = new Regex(
    @"(_GONOGO|_ALLOC)",
    RegexOptions.Compiled);

exp.Replace(string, String.Empty);
David Andres
Note also (from MSDN) "The Regex class is immutable (read-only) and is inherently thread safe." You can create it once and assign it to a static readonly field. See http://www.acorns.com.au/blog/?p=136
TrueWill
And from the Atwood Archives: http://www.codinghorror.com/blog/archives/000228.html
TrueWill
+3  A: 

This is expected; in general, manipulating a string by hand will be faster than using a regular expression. Using a regex involves compiling an expression down to a regex tree, and that takes time.

If you're using this regex in multiple places, you can use the RegexOptions.Compiled flag to reduce the per-match overhead, as David describes in his answer. Other regex experts might have tips for improving the expression. You might consider sticking with the String.Replace, though; it's fast and readable.

Michael Petrotta
+1  A: 

If they all end in one of those patterns, it would likely be faster to drop replace altogether and use:

string result = source.Substring(0, source.LastIndexOf('_'));
Abraham Pinzur
+1  A: 

When you have that much information about your problem domain, you can make things pretty simple:

const int AllocLength = 6;
const int GonogoLength = 7;
string s = ...;
if (s[s.Length - 1] == 'C')
    s = s.Substring(0, s.Length - AllocLength);
else
    s = s.Substring(0, s.Length - GonogoLength);

This is theoretically faster than Abraham's solution, but not as flexible. If the strings have any chance of changing then this one would suffer from maintainability problems that his does not.

280Z28