tags:

views:

156

answers:

6

I am a complete newb when it comes to regex, and would like help to make an expression to match in the following:

 {ValidFunctionName}({parameter}:"{value}")

 {ValidFunctionName}({parameter}:"{value}",
                     {parameter}:"{value}")

 {ValidFunctionName}()

Where {x} is what I want to match, {parameter} can be anything $%"$ for example and {value} must be enclosed in quotation marks.

ThisIsValid_01(a:"40")

would be "ThisIsValid_01", "a", "40"

ThisIsValid_01(a:"40", b:"ZOO")

would be "ThisIsValid_01", "a", "40", "b", "ZOO"

01_ThisIsntValid(a:"40")

wouldn't return anything

ThisIsntValid_02(a:40)

wouldn't return anything, as 40 is not enclosed in quotation marks.

ThisIsValid_02()

would return "ThisIsValid_02"

For a valid function name I came across: "[A-Za-z_][A-Za-z_0-9]*" But I can't for the life of me figure out how to match the rest. I've been playing around on http://regexpal.com/ to try to get valid matches to all conditions, but to no avail :(

It would be nice if you kindly explained the regex too, so I can learn :)

+2  A: 

EDIT: This will work, uses 2 regexs. The first get the function name and everything inside it, the second extracts each pair of params and values from what's inside the function's brackets. You cannot do this with a single regex. Add some [ \t\n\r]* for whitespace.

Regex r = new Regex(@"(?<function>\w[\w\d]*?)\((?<inner>.*?)\)");
Regex inner = new Regex(@",?(?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(a:\"lolololol\",b:\"2\") _test1(ghgasghe:\"asjkdgh\")";

List<List<string>> matches = new List<List<string>>();

MatchCollection mc = r.Matches(input);
foreach (Match match in mc)
{
    var l = new List<string>();
    l.Add(match.Groups["function"].Value);
    foreach (Match m in inner.Matches(match.Groups["inner"].Value))
    {
         l.Add(m.Groups["param"].Value);
         l.Add(m.Groups["value"].Value);
    }
    matches.Add(l);
}

(Old) Solution

(?<function>\w[\w\d]*?)\((?<param>.+?):"(?<value>[^"]*?)"\)

(Old) Explanation

Let's remove the group captures so it is easier to understand: \w[\w\d]*?\(.+?:"[^"]?"\)

\w is the word class, it is short for [a-zA-Z_]
\d is the digit class, it is short for [0-9]

  1. \w[\w\d]*? Makes sure there is valid word character for the start of the function, and then matches zero or more further word or digit characters.

  2. \(.+? Matches a left bracket then one or more of any characters (for the parameter)

  3. :"[^"]*?"\) Matches a colon, then the opening quote, then zero or more of any character except quotes (for the value) then the close quote and right bracket.

Brackets (or parens, as some people call them) as escaped with the backslashes because otherwise they are capturing groups.

The (?<name> ) captures some text.

The ? after each the * and + operators makes them non-greedy, meaning that they will match the least, rather than the most, amount of text.

(Old) Use

Regex r = new Regex(@"(?<function>\w[\w\d]*?)\((?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(aa%£$!:\"lolololol\") _test1(ghgasghe:\"asjkdgh\")";

List<string[]> matches = new List<string[]>();

if(r.IsMatch(input))
{
    MatchCollection mc = r.Matches(input);
    foreach (Match match in mc)
    matches.Add(new[] { match.Groups["function"].Value, match.Groups["param"].Value, match.Groups["value"].Value });
}

EDIT: Now you've added an undefined number of multiple parameters, I would recommend making your own parser rather than using regexs. The above example only works with one parameter and strictly no whitespace. This will match multiple parameters with strict whitespace but will not return the parameters and values:

\w[\w\d]*?\(.+?:"[^"]*?"(,.+?:"[^"]*?")*\)

Just for fun, like above but with whitepace:

\w[\w\d]*?[ \t\r\n]*\([ \t\r\n]*.+?[ \t\r\n]*:[ \t\r\n]*"[^"]*?"([ \t\r\n]*,[ \t\r\n]*.+?[ \t\r\n]*:[ \t\r\n]*"[^"]*?")*[ \t\r\n]*\)

Capturing the text you want will be hard, because you don't know how many captures you are going to have and as such regexs are unsuited.

Callum Rogers
Problem: `+` and `*` are greedy by default. `(?<param>.+):` will swallow everything up to the last colon, so it won't parse multiple parameters. Same problem with `"(?<value>.*)"` Perhaps change `.+` and `.*` to `.+?` and `.*?`.
Greg
@Greg, thanks, I'll add that in.
Callum Rogers
Seems much simpler to understand than most of the others, which also have the greed issues from + and *, I'll try gregs solution to this.
Blam
+1  A: 

Try this:

^\s*(?<FunctionName>[A-Za-z][A-Za-z_0-9]*)\(((?<parameter>[^:]*):"(?<value>[^"]+)",?\s*)*\)
  • ^\s*(?<FunctionName>[A-Za-z][A-Za-z_0-9]*) matches the function name, ^ means start of the line, so that the first character in string must match. You can keep you remove the whitespace capture if you don't need it, I just added it to make the match a little more flexible.
  • The next set \(((?<parameter>[^:]*):"(?<value>[^"]+)",?)*\) means capture each parameter-value pair inside the parenthesis. You have to escape the parenthesis for the function since they are symbols within the regex syntax.

The ?<> inside parenthesis are named capture groups, which when supported by a library, as they are in .NET, make grabbing the groups in the matches a little easier.

Benjamin Anderson
That's not going to work. First, it restricts the "parameter" to valid function names when the question says that "parameter" can include special characters. Secondly, it doesn't require that the "value" be enclosed in quotation marks.
Jim Mischel
It seems to not match when I have more than one parameter and value group, I will *attempt* to fix this myself armed with your knowledge. Thanks for the tip on ?<>, I will put it to good use :D
Blam
Yeah, I forgot the comma and whitespace after the parameter pair. I've corrected it.
Benjamin Anderson
+1  A: 

Here:

\w[\w\d]*\s*\(\s*(?:(\w[\w\d]*):("[^"]*"|\d+))*\s*\)

Visualization here (sorry for the url-shortener, but markdown didn't accept it)

Eric
nice link! 12345
iterationx
Woah, cool site!
jjnguy
A: 

For Problems like that I always suggest people not to "find" a single regex but to write multiple regex sharing the work.

But here is my quick shot:

(?<funcName>[A-Za-z_][A-Za-z_0-9]*)
\(
    (?<ParamGroup>
        (?<paramName>[^(]+?)
        :
        "(?<paramValue>[^"]*)"
        ((,\s*)|(?=\)))
    )*
\)

The whitespaces are there for better readability. Remove them or set the option to ignore pattern whitespaces.

Scordo
A: 

This regex passes all your test cases:

^(?<function>[A-Za-z][\w]*?)\(((?<param>[^:]*?):"(?<value>[^"]*?)",{0,1}\s*)*\)$

This works on multiple parameters and no parameters. It also handles special characters in the param name and whitespace after the comma. There may need to be some adjustments as your test cases do not cover everything you indicate in your text.

Please note that \w usually includes digits and is not appropriate as the leading character of the function name. Reference: http://www.regular-expressions.info/charclass.html#shorthand

Aaron D
+1  A: 

Someone else has already given an answer that gives you a flat list of strings, but in the interest of strong typing and proper class structure, I’m going to provide a solution that encapsulates the data properly.

First, declare two classes:

public class ParamValue         // For a parameter and its value
{
    public string Parameter;
    public string Value;
}
public class FunctionInfo       // For a whole function with all its parameters
{
    public string FunctionName;
    public List<ParamValue> Values;
}

Then do the matching and populate a list of FunctionInfos:

(By the way, I’ve made some slight fixes to the regexes... it will now match identifiers correctly, and it will not include the double-quotes as part of the “value” of each parameter.)

Regex r = new Regex(@"(?<function>[\p{L}_]\w*?)\((?<inner>.*?)\)");
Regex inner = new Regex(@",?(?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(a:\"lolololol\",b:\"2\") _test1(ghgasghe:\"asjkdgh\")";

var matches = new List<FunctionInfo>();

if (r.IsMatch(input))
{
    MatchCollection mc = r.Matches(input);
    foreach (Match match in mc)
    {
        var l = new List<ParamValue>();

        foreach (Match m in inner.Matches(match.Groups["inner"].Value))
            l.Add(new ParamValue
            {
                Parameter = m.Groups["param"].Value,
                Value = m.Groups["value"].Value
            });

        matches.Add(new FunctionInfo
        {
            FunctionName = match.Groups["function"].Value,
            Values = l
        });
    }
}

Then you can access the collection nicely with identifiers like FunctionName:

foreach (var match in matches)
{
    Console.WriteLine("{0}({1})", match.FunctionName,
        string.Join(", ", match.Values.Select(val =>
            string.Format("{0}: \"{1}\"", val.Parameter, val.Value))));
}
Timwi