I'm parsing a simple language (Excel formulas) for the functions contained within. A function name must start with any letter, followed by any number of letters/numbers, and ending with an open paren (no spaces in between). For example MyFunc(
. The function can contain any arguments, including other functions and must end with a close paren )
. Of course, math within parens is allowed =MyFunc((1+1))
and (1+1)
shouldn't be detected as a function because it fails the function rule I've just described. My goal is to recognize the highest level function calls in a formula, identify the function name, extract the arguments. With the arguments, I can recursively look for other function calls.
Using this tutorial I hacked up the following regexes. None seem to do the trick. They both fail on test case pasted below.
This should work but completely fails:
(?<name>[a-z][a-z0-9]*\()(?<body>(?>[a-z][a-z0-9]*\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)
This works for many test cases, but fails for test case below. I don't think it handles nested functions correctly- it just looks for open paren/close paren in the nesting:
(?<name>[a-z][a-z0-9]*\()(?<body>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)
Here's the test that breaks them all:
=Date(Year(A$5),Month(A$5),1)-(Weekday(Date(Year(A$5),Month(A$5),1))-1)+{0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1
This should be matched as:
Date(ARGUMENTS1)
Weekday(ARGUMENTS2)
Where ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)
Instead it matches:
ARGUMENTS2 = Date(Year(A$5),Month(A$5),1)-1)
I am using .net RegEx which provides for external memory.