views:

332

answers:

6

What is the most efficient way to parse an integer out of a string that contains letters and spaces?

Example: I am passed the following string: "RC 272". I want to retrieve 272 from the string.

I am using C# and .NET 2.0 framework.

+7  A: 

Since the format of the string will not change KISS:

string input = "RC 272";
int result = int.Parse(input.Substring(input.IndexOf(" ")));
Gavin Miller
+4  A: 

A simple regex can extract the number, and then you can parse it:

int.Parse(Regex.Match(yourString, @"\d+").Value, NumberFormatInfo.InvariantInfo);

If the string may contain multiple numbers, you could just loop over the matches found using the same Regex:

for (Match match = Regex.Match(yourString, @"\d+"); match.Success; match = match.NextMatch()) {
    x = int.Parse(match.Value, NumberFormatInfo.InvariantInfo); // do something with it
}
Lucero
I am getting the following error when I try to build this solution:"The best overload method match for int.Parse(string, System.Globalization.NumberStyles) has some invalid arguments"
Michael Kniskern
Oops, sorry. Fixed.
Lucero
I didn't know that "simple" and "regex" could be used in the same sentence without a grammar error.
Steven Sudit
Regular expressions don't need to be messy, in the .NET engine you can use named groups, lookahead and lookbehind assertions etc. as well as options auch as ignoring pattern whitespace (allowing to structure the regex contents, putting it on multiple indented lines) as well as specify explicit capture for groups to keep the performance high even if using a lot of normal groups. All this helps making the regex pretty readable in fact. To some extent, you can compare it to the C language, which can be obfuscated to a complete mess but which can also used reasonably.
Lucero
Also, the regex does capture the intent better than the `indexof` or `split` solutions, and it allows handling problems (invalid input data) better by not matching the pattern (via `match.Success` property), instead of leading to invalid indexes or parsing errors somewhere later in the process. (Sorry for the 2nd comment, I was running out of characters on the first one ;) )
Lucero
What I was alluding to is that regexp is perhaps overkill for this task, and will likely have relatively poor performance, even with precompilation.
Steven Sudit
I see. However, I think that the performance of the regex should not be as bad as one might expect. If performance is really crucial, I'd also not use the `Split()` function, my guess is that the "other" solution I suggested may perform well in this regard.
Lucero
I guess my concern is that both of your solutions are complex, differing in either being under- or over-optimized. I'd be pretty happy with LFSR's answer, which performs well and is highly maintainable.
Steven Sudit
I guess its also a matter of taste, and I frankly don't see why they should be more "complex". In any way, I believe that adding input validation, which I see as a must if it is being used in a production environment, would be much easier and less messy in both my solutions (checking match.IsSuccess in the Regex one and checking for 0 or using a flag in the second - no unwanted exceptions thrown like IndexOutOfRange).
Lucero
At risk of beating a dead horse, I'll explain. The complexity of the foreach solution is that it's reinventing the Int32.Parse wheel but doing it differently. For example, it treats "R2D2" as 22, where Parse will throw. If you want to avoid throwing, you can use TryParse, which even has an overload that takes a NumberStyles enum to fine-tune its behavior. In contrast, the foreach solution (as written) cannot distinguish between "abc" and "0". It's optimized, sure, but weird enough to cause trouble.
Steven Sudit
Your other answer brings out the elephant gun to shoot a mouse. RegExp expressions are incredibly powerful, but they're just overkill. Besides performance issues, they're complex enough to easily get wrong. And, frankly, I think NumberStyles covers all of the relevant options. Please take this as constructive criticism: if I thought you were an idiot, I wouldn't waste any time explaining myself.
Steven Sudit
@Steven, thanks for explaining. You are right with your observations about the foreach solution, and I added it after a pretty funny yet very inefficient LINQ approach was posted (was removed unfortunately) to show that LINQ is not 42. But for the regex solution, if I was to parse this I would enhance the regex to match the pattern as specified, eliminating the risk of trying to match anything but the pattern it was designed for. As for performance, after a warm-up the regex is not so bad, especially since the input validation is covered in the same step. But maybe it's just a matter of taste.
Lucero
@Lucero: If I were to go down the RegExp road, I'd strongly consider using http://www.ultrapico.com/Expresso.htm to thoroughly test it.
Steven Sudit
A: 

EDIT:

If it will always be in that format wouldn't something like the following work where value = "RC 272"?

int someValue = Convert.ToInt32(value.Substring(value.IndexOf(' ') + 1));
Wil P
+2  A: 

If it will always be in the format "ABC 123":

string s = "RC 272";
int val = int.Parse(s.Split(' ')[1]); // val is 272
John Rasch
Works, but creates two temp strings, where LFSR's solution creates just one.
Steven Sudit
A: 

Just for the fun of it, another possibility:

int value = 0;
foreach (char c in yourString) {
  if ((c >= '0') && (c <= '9')) {
    value = value*10+(c-'0');
  }
}
Lucero
A: 

Guys, since it will always be in the format "ABC 123", why not skip the IndexOf step?

string input = "RC 272";
int result = int.Parse(input.Substring(3));
GP