tags:

views:

314

answers:

3

Hi,

This should be a simple one but I'm looking for the best answer with at least some regard for performance as well as elegance.

I have a list of strings and some of the values are of the form q1, q2, q3 etc.

I'd like to select these.

What would be the best method of doing this?

+1  A: 

The best answer is to use either Regex or Jay Bazuzi's int.TryParse suggestion.

As a quick example try this in LINQPad:

void Main()
{
    int n = 100;

    string[] a = {"q2", "q3", "b"};
    a = a.Concat(Enumerable.Repeat(0,n).Select(i => "qasd")).ToArray(); /* Random data */

    /* Regex Method */
    System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex("^q[0-9]+$");
    List<string> regMethod = a.Where(c => r.IsMatch(c)).ToList().Dump("Result");

    /* IsInteger Method */
    List<string> intMethod = a.Where(c => c.StartsWith("q") && IsInteger(c.Substring(1))).ToList().Dump("Result");

    /* int.TryParse Method suggest by Jay Bazuzi */
    int e = 0;
    List<string> parseMethod = a.Where(c => c.StartsWith("q") && int.TryParse(c.Substring(1), out e)).ToList().Dump("Result");
}

public static bool IsInteger(string theValue)
{
   try
   {
       Convert.ToInt32(theValue);
       return true;
   } 
   catch 
   {
       return false;
   }
}

Try commenting one of the two methods out at a time and try out the performance for different values of n.

My experience (on my Core 2 Duo Laptop) seems to suggest:

n = 100. Regex takes about 0.003 seconds, IsInteger takes about 0.01 seconds
n = 1,000. Regex takes about 0.004 seconds, IsInteger takes about 0.07 seconds
n = 10,000. Regex takes about 0.007 seconds, IsInteger takes about 0.67 seconds
n = 100,000. Regex takes about 0.046 seconds, IsInteger takes about 6.9 seconds
n = 1,000,000. Regex takes about 0.42 seconds, IsInteger takes about 1 minute and 6 seconds

ParseMethod has the same performance as Regex (slightly faster if done inline as in the code sample, about the same if done in a separate method, i.e. replacing the IsInteger method body).

NB: The cost of creating the strings is not accounted for (insert a date diff if you like), but the cost is the same for both methods

These numbers are much closer if the majority of the keys do not being with 'q' (IsInteger is never called), but the Regex is as good or better even when this is the case

I.e. (for filler string of "asdasd" rather than "qasd"):

n = 100. Regex takes about 0.003 seconds, IsInteger takes about 0.003 seconds
n = 1,000. Regex takes about 0.004 seconds, IsInteger takes about 0.004 seconds
n = 10,000. Regex takes about 0.005 seconds, IsInteger takes about 0.005 seconds
n = 100,000. Regex takes about 0.023 seconds, IsInteger takes about 0.025 seconds
n = 1,000,000. Regex takes about 0.21 seconds, IsInteger takes about 0.22 seconds

Again ParseMethod has the same performance as Regex.

Conclusion: Use either Regex or the TryParse, it will be much faster in the worst case and as fast otherwise

However, are there better/quicker ways of selecting the int values out of a collection of strings? Perhaps a generic filter that is somehow compiled faster?

Graphain
Why are you asking a question, answering it, and then asking a question in your answer?
womp
I'm asking a question because I'm not sure on the best way, offering what I have as my best solution to help others and asking if others still have a better solution.
Graphain
And mostly it's so I can come back here next time I run into this.
Graphain
Your performance measurements include the call to .ToList(), which may be significant. (Or it may not be. Guessing at perf always gets me in trouble.)
Jay Bazuzi
Also, compiled regex is probably faster for lots of repeated matching.
Jay Bazuzi
Yeah compiled Regex may perform ever better. The difference between Regex and TryParse isn't an order of magnitude different so it's best to use whichever seems clearer to you out of these two I think.
Graphain
+1  A: 

Seems like you're trying to microoptimize, which means you'll spend a lot of effort to make your program run the same speed. Focus on clarity of code first, and then optimize what is actually slow.

Assuming you have profiled and found this to be your application's bottleneck in real-world scenarios:

It is inelegant (and often a performance drag) to use exceptions in non-exceptional cases. See http://www.developerfusion.com/code/4650/validating-an-integer/, for example.

Depending on the constraints of your situation, you're probably better off changing your IsInteger() to do one of:

bool int.TryParse(string s, out int result);

(See http://msdn.microsoft.com/en-us/library/system.int32.tryparse.aspx)

or:

Microsoft.VisualBasic.Information.IsNumeric(object expression)

(See http://www.hanselman.com/blog/ExploringIsNumericForC.aspx)

or

x >= '0' && x < '9'

And then:

.Where(c => c[0] == 'q' && IsInteger(c[1]))
Jay Bazuzi
Thanks - I was wondering about this stuff. I didn't really want to use a Regex because of clarity issues and maybe one of these methods is better!
Graphain
Great, thanks I kind of didn't think the TryParse would work in this sitatuon. It turns out to perform as good or better than TryParse. My answer has been updated to reflect this.
Graphain
er "as good or better than Regex**".
Graphain
+1  A: 

The bottleneck you have with IsInteger is probably because of the try-catch.

I tried to replace IsInteger with TryParse, and I get the following results (with n=1,000,000):

Regex method: 540 ms

TryParse method: 537 ms

I used the following code for the second method:

Func<string, bool> lambda = (string c) => { Int32 temp; 
                                    return c.StartsWith("q") 
                                    && int.TryParse(c.Substring(1),out temp); };
List<string> intMethod = a.Where(lambda).ToList();

Moral of the story is...

Although I usually prefer to use Regex, in this simple case where the string manipulations are simple the TryParse solution is perfectly acceptable. And performance-wise, it doesn't really matter which method you use, but don't use exception handling to check if some string value is an int!

Meta-Knight
Yep you're right, see my updated answer/comments on Jay Bazuzi's post. Nice work with the lambda example, I didn't realise you could include multiple lines.
Graphain