tags:

views:

58

answers:

1

I'm converting some excel formulas to another system and need to do some fairly nifty search and replace magic. I presume Regex are the tools for the job in this case, but if anyone has any other ideas I'd like to hear them.

I'm working on getting these formulas into something resembling SQL syntax. I also have to deal with algebraic symbols so I might have the following:

9^2 will need to become POWER(9,2). (A + 3)^3 will need to become POWER((A+3),3).

Whats the best approach for this?

I'm using C# 3.5 if that makes a difference.

edit: An example of something I have to parse (the power symbol is near the end):

"((({VAL(9286)} / 1000) * {VAL(4648)}) + (({VAL(9609)} / 1000) + ({VAL(6480)} / 1000)) * {VAL(8574)}) / ({VAL(9286)} / 1000 + {VAL(9609)} / 1000 + {VAL(6480)} / 1000) * (({VAL(9286)} / 22.4)*34.38 + {VAL(9609)} + {VAL(6480)}) * ((1.075068 + 0.001*11.17019 * ((({VAL(9286)} / 1000) * {VAL(4648)}) + (({VAL(9609)} / 1000) + ({VAL(6480)} / 1000)) * {VAL(8574)}) / ({VAL(9286)} / 1000 + {VAL(9609)} / 1000 + {VAL(6480)} / 1000)+273.15)) + (100000*0.90755 / ((({VAL(9286)} / 1000) * {VAL(4648)}) + (({VAL(9609)} / 1000) + ({VAL(6480)} / 1000)) * {VAL(8574)}) / ({VAL(9286)} / 1000 + {VAL(9609)} / 1000 + {VAL(6480)} / 1000) + 273.15)^2))*4.1868/32)"
+2  A: 

Because of the possibility of nested parenthesis, regex is not a suitable tool for this task. It's better to use a mathematical expression parser.

But it's not impossible. For example, by repeatedly replacing the pattern

((?=[\w.(])[\w.]*\s*(?:\((?>[^()]+|\((?<O1>)|\)(?<-O1>))*(?(O1)(?!))\))?)\s*\^\s*((?=[\w.(])[\w.]*\s*(?:\((?>[^()]+|\((?<O2>)|\)(?<-O2>))*(?(O2)(?!))\))?)

with

"POWER($1,$2)"

until the string does not change, it should be able to turn all a^b into POWER(a,b). Example:

   (a+3)^(b+5^(c+3)) + 9 ^ 2 + (A + 3)^3 + (5^7)^(6^(8^9-1)-3)
-> POWER((a+3),(b+5^(c+3))) + POWER(9 ,2 )+ POWER((A + 3),3 )+ POWER((5^7),(6^(8^9-1)-3))
-> POWER((a+3),(b+POWER(5,(c+3)))) + POWER(9 ,2 )+ POWER((A + 3),3 )+ POWER((POWER(5,7)),(POWER(6,(8^9-1))-3))
-> POWER((a+3),(b+POWER(5,(c+3)))) + POWER(9 ,2 )+ POWER((A + 3),3 )+ POWER((POWER(5,7)),(POWER(6,(POWER(8,9)-1))-3))
-> done

Note that this regex will assume ^ is left-associative instead of right-associative.

   1^2^3
-> POWER(1,2)^3
-> POWER(POWER(1,2),3)
-> done

although 1^2^3 itself isn't well-formed.

KennyTM
Thanks for the response. I am supposed to be dealing with arbitrarily long equations, however in reality I think it will be fairly easy for me to check if any special cases get through the cracks. I'll give this a shot.
Alex
I tried this out, but there seems to be a mistake in the regex. It doesnt replace pwoer symbols.
Alex
Does it make any difference that I'm using .NET Regex.Replace method?
Alex
@Alex: It is using `Regex.Replace`. Make sure there's no newline in the pattern.
KennyTM
Yes, I have it working now. It works for simple expressions but not some of the more complicated statements. I'll put an example of one string this regex failed to parse in my original post.
Alex
@Alex: Try update. BTW your example's paranthesis isn't balanced.
KennyTM
Would you be able to expand on how the regex is structured? I have to do the same thing for converting Excel IFs (IF(A,B,C)) into SQL (CASE WHEN A THEN B ELSE C) statements and am having difficulty in parsing the meaning of the regex.
Alex
@Alex: Oh. You really should use an Excel formula parser. The regex won't scale.
KennyTM