tags:

views:

141

answers:

3

In my answer to this question, I mentioned that we used UpperCamelCase parsing to get a description of an enum constant not decorated with a Description attribute, but it was naive, and it didn't work in all cases. I revisited it, and this is what I came up with:

var result = Regex.Replace(camelCasedString, 
                            @"(?<a>(?<!^)[A-Z][a-z])", @" ${a}");
result = Regex.Replace(result,
                            @"(?<a>[a-z])(?<b>[A-Z0-9])", @"${a} ${b}");

The first Replace looks for an uppercase letter, followed by a lowercase letter, EXCEPT where the uppercase letter is the start of the string (to avoid having to go back and trim), and adds a preceding space. It handles your basic UpperCamelCase identifiers, and leading all-upper acronyms like FDICInsured.

The second Replace looks for a lowercase letter followed by an uppercase letter or a number, and inserts a space between the two. This is to handle special but common cases of middle or trailing acronyms, or numbers in an identifier (except leading numbers, which are usually prohibited in C-style languages anyway).

Running some basic unit tests, the combination of these two correctly separated all of the following identifiers: NoDescription, HasLotsOfWords, AAANoDescription, ThisHasTheAcronymABCInTheMiddle, MyTrailingAcronymID, TheNumber3, IDo3Things, IAmAValueWithSingleLetterWords, and Basic (which didn't have any spaces added).

So, I'm posting this first to share it with others who may find it useful, and second to ask two questions:

  1. Anyone see a case that would follow common CamelCase-ish conventions, that WOULDN'T be correctly separated into a friendly string this way? I know it won't separate adjacent acronyms (FDICFCUAInsured), recapitalize "properly" camelCased acronyms like FdicInsured, or capitalize the first letter of a lowerCamelCased identifier (but that one's easy to add - result = Regex.Replace(result, "^[a-z]", m=>m.ToString().ToUpper());). Anything else?

  2. Can anyone see a way to make this one statement, or more elegant? I was looking to combine the Replace calls, but as they do two different things to their matches it can't be done with these two strings. They could be combined into a method chain with a RegexReplace extension method on String, but can anyone think of better?

A: 

not that this directly answers the question, but why not test by taking the standard C# API and converting each class into a friendly name? It'd take some manual verification, but it'd give you a good list of standard names to test.

atk
Because I like the TDD approach. AFAIK this will properly parse just about any identifier you're likely to see in code, having verified the above specific cases, which are each representative of a subset. If someone thinks of a case that will break it, It's near-trivial to write a test exercising that case.
KeithS
I guess I'm a little confused. How is creating a large test set orthogonal to test driven development?
atk
A: 

Let's say every case you come across works with this (you're asking us for examples that won't and then giving us some, so you don't even have a question left).

This still binds UI to programmatic identifiers in a way that will make both programming and UI changes brittle.

It still assumes your program will only be used in one language. Either your potential market it so small that just indexing an array of names would be scalable enough (e.g. a one-client bespoke or in-house project), or you are assuming you will never be successful enough to need to be available to other languages or other dialects of your first-chosen language.

Does "well, it'll work as long as we're a failure" sound like a passing grade in balancing designs?

Either code it to use resources, or else code it to pass the enum name blindly or use an array of names, as that at least will be modifiable afterwards.

Jon Hanna
+1  A: 

So while I agree with Hans Passant here, I have to say that I had to try my hand at making it one regex as an armchair regex user.

(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))

Is what I came up with. It seems to pass all the tests you put forward in the question.

So

var result = Regex.Replace(camelCasedString, @"(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))", @" ${a}");

Does it in one pass.

Philip Rieck