tags:

views:

1534

answers:

8

what's the quickest way to extract a 5 digit number from a string in c#.

I've got

string.Join(null, System.Text.RegularExpressions.Regex.Split(expression, "[^\\d]"));

Any others?

+2  A: 

Do you mean convert a string to a number? Or find the first 5 digit string and then make it a number? Either way, you'll probably be using decimal.Parse or int.Parse.

I'm of the opinion that Regular Expressions are the wrong approach. A more efficient approach would simply to walk through the string looking for a digit, and then advancing 4 characters and seeing if they are all digits. If they are, you've got your substring. It's not as robust, no, but it doesn't have the overhead either.

Tom Ritter
+5  A: 

Use a regular expression (\d{5}) to find the occurrence(s) of the 5 digit number in the string and use int.Parse or decimal.Parse on the match(s).

In the case where there is only one number in text.

int? value = null;
string pat = @"\d{5}"
Regex r = new Regex(pat);
Match m = r.Match(text);
if (m.Success)
{
   value = int.Parse(m.Value);
}
tvanfosson
As @Jon Skeet notes, if you know the character of the input ahead of time there could be much faster ways to do this. This is a reasonably fast, robust way to do it in the face of unknown input formats.
tvanfosson
+2  A: 

Don't use a regular expression at all. It's way more powerful than you need - and that power is likely to hit performance.

If you can give more details of what you need it to do, we can write the appropriate code... (Test cases would be ideal.)

Jon Skeet
Yes. If you know the character of the input, there are much better ways. If, for instance, if you know there is a single number in the input and no others, then you can iterate through the characters until you find a digit and then start "accumulating" the value over the next 4 digits.
tvanfosson
+1  A: 

If the numbers exist with other characters regular expressions are a good solution.

EG: ([0-9]{5})

will match - asdfkki12345afdkjsdl, 12345adfaksk, or akdkfa12345

Gavin Miller
A: 

If you have a simple test case like "12345" or even "12345abcd" don't use regex at all. They are not known by they speed.

Petar Repac
A: 

For most strings a brute force method is going to be quicker than a RegEx.

A fairly noddy example would be:

string strIWantNumFrom = "qweqwe23qeeq3eqqew9qwer0q";

int num = int.Parse(
    string.Join( null, (
        from c in strIWantNumFrom.ToCharArray()
        where c == '1' || c == '2' || c == '3' || c == '4' || c == '5' ||
            c == '6' || c == '7' || c == '8' || c == '9' || c == '0'
        select c.ToString()
    ).ToArray() ) );

No doubt there are much quicker ways, and lots of optimisations that depend on the exact format of your string.

Keith
+6  A: 

The regex approach is probably the quickest to implement but not the quickest to run. I compared a simple regex solution to the following manual search code and found that the manual search code is ~2x-2.5x faster for large input strings and up to 4x faster for small strings:

static string Search(string expression)
{
  int run = 0;
  for (int i = 0; i < expression.Length; i++)
  {
    char c = expression[i];
    if (Char.IsDigit(c))
      run++;
    else if (run == 5)
      return expression.Substring(i - run, run);
    else
      run = 0;
  }
  return null;
}
const string pattern = @"\d{5}";
static string NotCached(string expression)
{
  return Regex.Match(expression, pattern, RegexOptions.Compiled).Value;
}

static Regex regex = new Regex(pattern, RegexOptions.Compiled);
static string Cached(string expression)
{
  return regex.Match(expression).Value;
}

Results for a ~50-char string with a 5-digit string in the middle, over 10^6 iterations, latency per call in microseconds (smaller number is faster):

Simple search: 0.648396us

Cached Regex: 2.1414645us

Non-cached Regex: 3.070116us

Results for a ~40K string with a 5-digit string in the middle over 10^4 iterations, latency per call in microseconds (smaller number is faster):

Simple search: 423.801us

Cached Regex: 1155.3948us

Non-cached Regex: 1220.625us

A little surprising: I would have expected Regex -- which is compiled to IL -- to be comparable to the manual search, at least for very large strings.

alexdej
A: 

This might be faster...

public static string DigitsOnly(string inVal)
        {
            char[] newPhon = new char[inVal.Length];
            int i = 0;
            foreach (char c in inVal)
                if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0)
                    newPhon[i++] = c;
            return newPhon.ToString();
        }

if you want to limit it to at most five digits, then

public static string DigitsOnly(string inVal)
        {
            char[] newPhon = new char[inVal.Length];
            int i = 0;
            foreach (char c in inVal)
                if (c.CompareTo('0') > 0 && c.CompareTo('9') < 0 && i < 5)
                    newPhon[i++] = c;
            return newPhon.ToString();
        }
Charles Bretana