views:

863

answers:

7

I'd like to write a method that converts CamelCase into a human-readable name.

Here's the test case:

public void testSplitCamelCase() {
    assertEquals("lowercase", splitCamelCase("lowercase"));
    assertEquals("Class", splitCamelCase("Class"));
    assertEquals("My Class", splitCamelCase("MyClass"));
    assertEquals("HTML", splitCamelCase("HTML"));
    assertEquals("PDF Loader", splitCamelCase("PDFLoader"));
    assertEquals("A String", splitCamelCase("AString"));
    assertEquals("Simple XML Parser", splitCamelCase("SimpleXMLParser"));
    assertEquals("GL 11 Version", splitCamelCase("GL11Version"));
}
+2  A: 

The following Regex can be used to identify the capitals inside words:

"((?<=[a-z0-9])[A-Z]|(?<=[a-zA-Z])[0-9]]|(?<=[A-Z])[A-Z](?=[a-z]))"

It matches every capital letter, that is ether after a non-capital letter or digit or followed by a lower case letter and every digit after a letter.

How to insert a space before them is beyond my Java skills =)

Edited to include the digit case and the PDF Loader case.

Jens
what about digits?
Yaneeve
@Yaneeve: I just saw the digits... this might make things more complicated. Probably another Regex to catch those would be the easy way.
Jens
@Jens: Will it match the `L` in `PDFLoader`?
Jørn Schou-Rode
how about (?<=[a-z0-9])[A-Z0-9] ?
Yaneeve
@Jørn: Good point! Need to think about that. =) .... ok, edited something in to catch those.
Jens
@Yaneeve: That will unfortunately match the second 1 in 11.
Jens
Now, I vastly admire your Regex skill, but I'd hate to have to maintain that.
Chris Knight
@Chris: Yep, thats true. Regex is more of a write-only language. =) Although this particular expression is not very hard to read, if you read `|` as "or". Well... maybe it is... I've seen worse =/
Jens
A: 

RegEx should work, something like ([A-Z]{1}). This will capture all Capital Letters, after that you could replace them with \1 or how ever you can refer to RegEx Groups in Java.

Bobby
`{1}` is redundant.
Jonathan Feinberg
+1  A: 

I think you will have to iterate over the string and detect changes from lowercase to uppercase, uppercase to lowercase, alphabetic to numeric, numeric to alphabetic. On every change you detect insert a space with one exception though: on a change from upper- to lowercase you insert the space one character before.

Felix
+18  A: 

This works with your testcases:

static String splitCamelCase(String s) {
   return s.replaceAll(
      String.format("%s|%s|%s",
         "(?<=[A-Z])(?=[A-Z][a-z])",
         "(?<=[^A-Z])(?=[A-Z])",
         "(?<=[A-Za-z])(?=[^A-Za-z])"
      ),
      " "
   );
}

Here's a test harness:

    String[] tests = {
        "lowercase",        // [lowercase]
        "Class",            // [Class]
        "MyClass",          // [My Class]
        "HTML",             // [HTML]
        "PDFLoader",        // [PDF Loader]
        "AString",          // [A String]
        "SimpleXMLParser",  // [Simple XML Parser]
        "GL11Version",      // [GL 11 Version]
        "99Bottles",        // [99 Bottles]
        "May5",             // [May 5]
        "BFG9000",          // [BFG 9000]
    };
    for (String test : tests) {
        System.out.println("[" + splitCamelCase(test) + "]");
    }

It uses zero-length matching regex with lookbehind and lookforward to find where to insert spaces. Basically there are 3 patterns, and I use String.format to put them together to make it more readable.

The three patterns are:

UC behind me, UC followed by LC in front of me

  XMLParser   AString    PDFLoader
    /\        /\           /\

non-UC behind me, UC in front of me

 MyClass   99Bottles
  /\        /\

Letter behind me, non-letter in front of me

 GL11    May5    BFG9000
  /\       /\      /\

References

Related questions

Using zero-length matching lookarounds to split:

polygenelubricants
I like your concern for readability
Yaneeve
C'est chic. Oh la la.
Jonathan Feinberg
Awesome. The trick of using look-behind regexes makes this a very elegant solution. Thank you!
Frederik
A: 

http://code.google.com/p/inflection-js/

You could chain the String.underscore().humanize() methods to take a CamelCase string and convert it into a human readable string.

atomicguava
inflection-js is in Javascript. I'm looking for a Java solution.
Frederik
Sorry about that.
atomicguava
A: 

I'm not a regex ninja, so I'd iterate over the string, keeping the indexes of the current position being checked & the previous position. If the current position is a capital letter, I'd insert a space after the previous position and increment each index.

Joel
A: 

Sorry, my solution is not correct. Deleted.

Peter Mucsi