tags:

views:

332

answers:

3

I have a regular expression to match 2 different number formats: \=(?[0-9]+)\?|\+(?[0-9]+)\?

This should return 9876543 as its Value for ;1234567890123456?+1234567890123456789012345123=9876543?
and ;1234567890123456?+9876543? What I would like is to be able to return another value along with the matched 'Value'.

So, for example, if the first string was matched, I'd like it to return:

Value: 9876543 Format:  LongFormat


And if matched in the second string:

Value: 9876543 Format:  ShortFormat


Is this possible?

+1  A: 

No, you can't match text that isn't there. The match can only return a substring of the target.

You essentially want to match against two patterns and take different actions in each case. See if you can separate them in your code:

if match(\=(?[0-9]+)\?) then
    return 'Value: ' + match + 'Format: LongFormat'
else if match(\+(?[0-9]+)\?) then
    return 'Value: ' + match + 'Format: ShortFormat'

(Excuse the dodgy pseudocode, but you get the idea.)

Bennett McElwee
Thanks for the answer, I was pretty sure that I was out of luck on that one.
Tass
+1  A: 

You can't match text that isn't there - but, depending on what language you're using, you can process what you match, and conditionally add text based on what is there.

With some implementations of regex, you can specify a "callback function" which allows you to run logic against each result.

Here's a pseudo-code example:

Input.replaceAll( /[+=][0-9]+(?=\?)/ , formatValue );

formatValue : function(match,groups)
{
 switch( left(match,1) )
 {
  case '+' : Format = 'Short';   break;
  case '=' : Format = 'Long';    break;
  default  : Format = 'Unknown'; break;
 }

 Value : match.replace('[+=]');

 return 'Value: '+Value+' Format: ' + Format;
}

What that will do, in a language that supports regex callbacks, is execute the formatValue function every time it finds a match, and use the result of the function as the replacement text.

You haven't specified which implementation you're using, so this may or not be possible for you, but it is definitely worth checking out.

Peter Boughton
+2  A: 

Another option, which is not quite the solution you wanted, but saves you using two separate regexes, is to use named groups, if your implementation supports it.

Here is some C#:

var regex = new Regex(@"\=(?<Long>[0-9]+)\?|\+(?<Short>[0-9]+)\?");
string test1 = ";1234567890123456?+1234567890123456789012345123=9876543?";
string test2 = ";1234567890123456?+9876543?";

var match = regex.Match(test1);
Console.WriteLine("Long: {0}", match.Groups["Long"]);     // 9876543
Console.WriteLine("Short: {0}", match.Groups["Short"]);   // blank
match = regex.Match(test2);
Console.WriteLine("Long: {0}", match.Groups["Long"]);     // blank
Console.WriteLine("Short: {0}", match.Groups["Short"]);   // 9876543

Basically just modify your regex to include the names, and then regex.Groups[GroupName] will either have a value or wont. You could even just use the Success property of the group to know which matched (match.Groups["Long"].Success).

UPDATE: You can get the group name out of the match, with the following code:

static void Main(string[] args)
{
 var regex = new Regex(@"\=(?<Long>[0-9]+)\?|\+(?<Short>[0-9]+)\?");
 string test1 = ";1234567890123456?+1234567890123456789012345123=9876543?";
 string test2 = ";1234567890123456?+9876543?";

 ShowGroupMatches(regex, test1);
 ShowGroupMatches(regex, test2);
 Console.ReadLine();
}

private static void ShowGroupMatches(Regex regex, string testCase)
{
 int i = 0;
 foreach (Group grp in regex.Match(testCase).Groups)
 {
  if (grp.Success && i != 0)
  {
   Console.WriteLine(regex.GroupNameFromNumber(i) + " : " + grp.Value);
  }
  i++;
 }
}

I'm ignoring the 0th group, because that is always the entire match in .NET

Ch00k
These are different strings from a swipe card reader but the differently formatted swipes could match a value to a different field in the database. So, the group name should be a field name (i.e., StudentID or CustomField) to know which field to match on. Looks like I need separate settings.
Tass