The following comment on my regex solution incite my to perform a analysis.
Isn't this a situation where regex is a little overly ugly? A straight code
solution would be simpler and more understandable, I think. – C. Ross
My response was the following.
May be. I decided for regex because you get all cases covered (if
your regex is correct). If you do it with String.Split() and friends, you
get a simpler solution if the input is valid. But catching all invalid inputs
using string methods might really become horror. When I start parsing a number to
decimal I already know that it is a valid number and will not fail. Or
think of inputs like '1-2-3;,-1:.:' - you will split them but crash later
quite sure or return a meaningless result. In David's code accessing part[2] will be
out of bounds for the input '1:2', parsing may fail, and there are quite
sure a few more uncatched errors. Catching them all will probably make
the code more unreadable than the regex code. – danbruc
So I decided to use Microsofts awesome tool PEX and analyse my regex approach and David's string opertion approach. I left David's code unmodified and replaced the console output in my solution with statements that build the result as List<List<Decimal>>
just like David does.
To make a quite complete analysis feasible, I constraint PEX to generate only inputs shorter than 45 characters and use only the following 9 different characters.
019.;,-:!
There is no need to use all numbers, because they (should) behave all the same. I included 9 to make it easy to discover the overflow but 0 and 1 schould also be sufficent - PEX would probaly find 1000 instead of 999. I included 0 and 1 to discover an error with very tiny numbers like 0.000[...]001 but nothing appeared. I assume very small numbers are silently rounded to zero but I did not investigate this further. Or may be 44 (44 because of the precision of decimal of 28 to 29 digits plus some room for other characters) characters were just to short to generate a small enough number. The other characters are included, because they are the other valid characters in the input. Finally I included the exclamation mark as surrogate for invalid characters.
The result of the analysis proved me right. PEX found two bugs in my code. I do not check for null input (I skipped that intentionaly to concentrate on the important part) causing the well known NullReferenceException and PEX discovered that the input "999999999999999999999999999999" causes Decimal.Parse() to fail with an OverflowException.
PEX also reports some false negative results. For example "!;9,;.0;990:!!:,900:09" was reported as an input causing a FormatException. Reruning the generated test yields no exception. It turns out that ".0" caused the test to fail during exploration. Looking at other failed tests reveals that Decimal.Parse() fails for (all) inputs starting with a decimal point during the exploration. But they are valid numbers and do not fail during normal execution. I am unable to explain this false positives.
And here is the result for one run of PEX against the string operation solution. Both implementation share the missing null check and the overflow exception. But the simple string operation solution is unable to handle many malformed inputs. They almost all result in a FormatException, but PEX discovered also the IndexOutOfRangeException I predicted.
FormatException: "!,"
FormatException: ","
FormatException: "1,"
FormatException: "!"
FormatException: ";9"
FormatException: "::"
FormatException: "!.999009"
FormatException: "!.0!99!9"
FormatException: "0,9.90:!!,,,!,,,,,,!,,,0!!!9,!"
FormatException: ""
FormatException: "-99,9"
FormatException: "1,9,,,!,,,,,,9,,,9,1,!9,,,,!,!"
FormatException: "!:,"
FormatException: "!9!:.!!,!!!."
FormatException: "!:"
IndexOutOfRangeException: "1:9"
FormatException: "09..::!"
FormatException: "9,0..:!.!,,,!,,,,,,!,,,!!-,!,!"
OverflowException: "99999999999999999999999999999999999999999999"
FormatException: "!."
FormatException: "999909!!"
FormatException: "-"
FormatException: "9,9:9:999,,,9,,,,,,!,,,!9!!!,!"
FormatException: "!9,"
FormatException: "!.09!!0!"
FormatException: "9-;"
FormatException: ":"
FormatException: "!.!9!9!!"
NullReferenceException: null
FormatException: ":,"
FormatException: "!!"
FormatException: "9;"
The question is now, how hard would it be to handle all this cases. The simple solution would be to guard the parsing instruction with try/catch clauses. I am not sure if this is sufficent to guarantee correct operation on the well formed part of the input. But may this is not required and a malformed input should cause an empty result, what would make it easy, to fix the solution.
Finally here are the code coverage results achievd. Note, that I analysed the regex solution using decimal and single because PEX was unable to instrument one method used inside Decimal.Parse().
ParseExpression(string) 100,00% 10/10 blocks
ParseSubExpression(string) 96,15% 25/26 blocks
ParseExpressionRegex(string) 95,06% 77/81 blocks
ParseExpressionRegexSingle(string) 94,87% 74/78 blocks
Conclusion for me - a regex solution should really be prefered. They are somewhat harder to design and understand, but they handle malformed inputs much robuster than a simple string operation based implementation. And just not to forget - I did not checke if the results returned are correct, at all. This is another case.