tags:

views:

52

answers:

2

I'm pretty new to using regexes and I can figure out how I would go about extracted a specific number from a string.

Suppose the string was any amount of whitespace or random text and somewhere within it is this, "Value: $1000.00."

In order to retrieve that value I am currently using this:

string value = Convert.ToString(Regex.Match(BodyContent, @"Value:[ \t]*\$?\d*(\.[0-9]{2})?", RegexOptions.Singleline));

So the variable 'value' now has, "Value: $1000.00" stored in it.

My question is, using Regex is there a way to use 'Value:' to find the number value but only store the actual number value (i.e. 1000.00) in the 'value' variable?

+3  A: 

Generally speaking, to accomplish something like this, you have at least 3 options:

  • Use lookarounds (?=...), (?<=...), so you can match precisely what you want to capture
    • Some languages have limited support for lookbehinds
  • Use capturing group (...) to capture specific strings
    • Near universally supported in all flavors
  • You can also just take a substring of the match
    • Works well if the length of the prefix/suffix to chop is a known constant

References


Examples

Given this test string:

i have 35 dogs, 16 cats and 10 elephants

These are the matches of some regex patterns:

You can also do multiple captures, for example:

  • (\d+) (cats|dogs) yields 2 match results (see on rubular.com)
    • Result 1: 35 dogs
      • Group 1 captures 35
      • Group 2 captures dogs
    • Result 2: 16 cats
      • Group 1 captures 16
      • Group 2 captures cats

Solution for this specific problem

It's much simpler to use capturing group in this case (see on ideone.com):

var text = "Blah blah Value: $1000.00 and more stuff";
string value = Convert.ToString(
   Regex.Match(
     text,
     @"Value:[ \t]*\$?(\d*(\.[0-9]{2})?)",
     RegexOptions.Singleline
   ).Groups[1]
);

The only thing that was added was:

  • A pair of matching parantheses in the pattern to capture the numeric portion
  • Accessing .Groups[1] of the Match object
polygenelubricants
\d+(?= cats) -> 16 That is the one I would like to use, but how do I change my regex to work the way that one does? I have tried this, @"(?=Value:[ \t]*\$?)\d*(\.[0-9]{2})?"
Immanu'el Smith
@Axilus: since `Value: ` is a prefix in this case, you use lookbehind `(?<=...)` instead of lookahead. It is conventional to just use capturing group in these cases, though. Lookarounds are kind of overkill here.
polygenelubricants
+1  A: 

In .NET, you'll want to get the Match object and then access its Groups property:

Match m = Regex.Match(BodyContent, @"Value:[ \t]*\$?(?<amount>\d*(\.[0-9]{2})?)", RegexOptions.Singleline);
string value = null;

if (m.Success)
{
    value = m.Groups["amount"].Value;
}

The syntax (?<amount> ... ) creates a named capture group that is stored by name in the m.Groups collection.

Jim Mischel