tags:

views:

98

answers:

2

I want to use regex to find unknown number of arguments in a string. I think that if I explain it would be hard so let's just see the example:

The regex: @ISNULL\('(.*?)','(.*?)','(.*?)'\)
The String: @ISNULL('1','2','3')
The result:

Group[0] "@ISNULL('1','2','3')" at 0 - 20 
Group[1] "1" at 9 - 10 
Group[2] "2" at 13 - 14  
Group[3] "3" at 17 - 18  

That's working great. The problem begins when I need to find unknown number of arguments (2 and more).

What changes do I need to do to the regex in order to find all the arguments that will occur in the string?

So, if I parse this string "@ISNULL('1','2','3','4','5','6')" I'll find all the arguments.

A: 

This answer is somewhat speculative as i have no clue what regex engine you are using. If the parameters are always numbers and always enclosed in single quotes, then why don't you try using the digit class like this:

'(\d)+?'

This is just the \d class and the extraneous @ISNULL stuff removed, as i assume you are only interested in the parameters themselves. You may not need the + and of course i don't know whether the engine you are using supports the lazy ? operator, just give it a go.

slugster
I'm using Java regex engine. I can't use \d because the numbers are just for a more understandable example.
Rotem
+2  A: 

If you don't know the number of potential matches in a repeated construct, you need a regex engine that supports captures in addition to capturing groups. Only .NET and Perl 6 offer this currently.

In C#:

  string pattern = @"@ISNULL\(('([^']*)',?)+\)";
  string input = @"@ISNULL('1','2','3','4','5','6')";
  Match match = Regex.Match(input, pattern);
  if (match.Success) {
     Console.WriteLine("Matched text: {0}", match.Value);
     for (int ctr = 1; ctr < match.Groups.Count; ctr++) {
        Console.WriteLine("   Group {0}:  {1}", ctr, match.Groups[ctr].Value);
        int captureCtr = 0;
        foreach (Capture capture in match.Groups[ctr].Captures) {
           Console.WriteLine("      Capture {0}: {1}", 
                             captureCtr, capture.Value);
           captureCtr++; 
        }
     }
  }   

In other regex flavors, you have to do it in two steps. E.g., in Java (code snippets courtesy of RegexBuddy):

First, find the part of the string you need:

Pattern regex = Pattern.compile("@ISNULL\\(('([^']*)',?)+\\)");
// or, using non-capturing groups: 
// Pattern regex = Pattern.compile("@ISNULL\\((?:'(?:[^']*)',?)+\\)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group();
} 

Then use another regex to find and iterate over your matches:

List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile("'([^']*)'");
    Matcher regexMatcher = regex.matcher(ResultString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group(1));
    } 
Tim Pietzcker
Thank you, but I'm using Java base engine.
Rotem
Thanks for the info - I have updated my answer.
Tim Pietzcker
I tried to do it but not all groups was found Group[0] "@ISNULL('1','2','3')" at 0 - 20Group[1] "'3'" at 16 - 19Group[2] "3" at 17 - 18
Rotem
you can see that the number 1 and 2 wasn't found.
Rotem
You didn't do what I wrote. Please read my answer carefully. You first need to find the entire string. Then you need to extract that string and apply the second regex to it iteratively. You only did the first match where group 0 is the entire string and group 1 is the last repetition of the capturing group. I could also have used non-capturing parentheses to make it clearer, but those are harder to read.
Tim Pietzcker
+1 for taking the time to help this guy....
slugster