ansaurus

Question

codingBat separateThousands using regex (and unit testing how-to)

Answer 1

+1 A:

When you state the requirements are you intending for them to be enforced by your method?

The number may contain an optional minus sign, and an optional decimal part. There will not be any superfluous leading zeroes.

If your intent is to have the method detect when those constraints are violated you will need additional to write additional unit-tests to ensure that contract is being enforced.

What about testing for 1234.5678.91011?

Do you expect your method to return 1,234.5678.91011 or just ignore the whole thing? Best to write a test to verify your expectations

nvuono 2010-04-24 08:28:56

These are good questions. I'm not really sure what I'm supposed to do with unit tests. Right now, if the input is "invalid" (by whose definition, I'm not sure), I just scream "undefined behavior!". I'm not sure if that's the proper way to unit test. Maybe I should just read a book.

polygenelubricants 2010-04-24 08:47:28

We're really just relying on the underlying RegEx engine to find the matching text and apply the transformation where appropriate. I guess the most important testing would ensure you don't malform text by adding commas where they don't belong.

nvuono 2010-04-24 09:19:22

Answer 2

+1 A:

This works for me:

return s.replaceAll("(\\G-?\\d{1,3})(?=(?:\\d{3})++(?!\\d))", "$1,");

The first time through, \G acts the same as ^, and the lookahead forces \d{1,3} to consume only as many characters as necessary to leave the match position at a three-digit boundary. After that, \d{1,3} consumes the maximum three digits every time, with \G to keep it anchored to the end of the previous match.

As for your unit tests, I would just make it clear in the problem description that the input will always be valid number, with at most one decimal point.

Alan Moore 2010-04-24 13:16:25

@Alan: any significance to the possessive quantifier?

polygenelubricants 2010-04-24 13:23:34

@poly: Not really. It makes the lookahead infinitesimally more efficient, but the regex won't fail without it. I just like to use them whenever I know indeterminacy is not in my interest.

Alan Moore 2010-04-24 14:08:32

@Alan: I haven't done any testing, but my gut tells me that my regex is faster since it doesn't use any capturing group, `\0` is empty, and the variable-length lookahead _first_ is needed less often (the more frequent _rest_ is simpler in comparison). Will do some testing and report tomorrow (need sleep).

polygenelubricants 2010-04-24 15:24:04

@Alan: By the way, if it's not obvious from the recent developments, I no longer see regex as this magical flute to make beautiful (sometimes orgasmic!) music with. I now see it as just a tool, which is why I picked a more "real" example (separating thousands), and why I'm more concerned with performance etc rather than conciseness, elegance, and all that first love blindness stuff =)

polygenelubricants 2010-04-24 15:29:15

@poly: If performance were a real concern, not using captures *could* make a significant difference--good point there. And that would apply even if I had left out the parens and used `$0` instead of `$1`, which I could have done. Most of the cost would come from parsing the group references and building the replacement string; I would expect that to dwarf any difference in *regex* performance.

Alan Moore 2010-04-24 18:54:20

@poly: As for the attitude progression: good on you! But I hope you don't lose your enthusiasm completely; your solution to the `repeatEnd` problem (http://stackoverflow.com/questions/2606214/codingbat-repeatend-using-regex) was bloody brilliant.

Alan Moore 2010-04-24 19:56:47

ansaurus

tags:

views:

answers:

codingBat separateThousands using regex (and unit testing how-to)

Regex part

Unit testing part

related questions