views:

272

answers:

4

I have some complex regular expressions which I need to comment for readability and maintenance. The Java spec is rather terse and I struggled for a long time getting this working. I finally caught my bug and will post it as an answer but I'd be grateful for any other advice on maintaining regexes

As an example I want to comment the subcomponents (of patternS) in a simple name parser:

    String testTarget = "Waldorf T. Flywheel";
    String patternS = "([A-Za-z]+)\\s+([A-Z]\\.)?\\s+([A-Za-z]+)";
    Pattern pattern = Pattern.compile(patternS, Pattern.COMMENTS);
    Assert.assertTrue(pattern.matcher(testTarget).matches());

EDIT: I would be grateful for examples of the (?x) format as well.

EDIT: @geowa4 has a good suggestion which avoids embedded comments. Sinnce java and others have provided for embedded comments what are the cases where they are useful? (I think I have a case but I'd be interested to see others).

EDIT: As noted below @mikej the regex does not support the optional initial well and would be better as:

        String patternS = "([A-Za-z]+)\\s+([A-Z]\\.\\s+)?([A-Za-z]+)";

but that would end up extracting space in the initial

+4  A: 

I found the following worked:

        String pattern2S = 
         "([A-Za-z]+)      # mandatory firstName\n" +
      "\\s+             # mandatory whitespace\n " +
      "([A-Z]\\.)?      # optional initial\n" +
      "\\s+             # whitespace\n " +
      "([A-Za-z]+)      # mandatory lastName\n";

The key thing was to include the newline character \n explicitly in the string

peter.murray.rust
+7  A: 

Why don't you just do this:

String pattern2S = 
    "([A-Za-z]+)" + //    mandatory firstName
    "\\s+" +        //    mandatory whitespace
    ...;

CONTINUATION:

If you want to keep the comments with the pattern and you need to read it in from a properties file, use this:

pattern=\
#comment1\\n\
(A-z)\
#comment2\\n\
(0-9)
geowa4
Good suggestion. This would work in many simple cases but I want the regular expressions to be independent of the code in which they are used (e.g. in external data files). The inline comments will still be visible.
peter.murray.rust
A: 

For clean and readable code you should put the comment above and not in the code:

// (mandatory firstname) (optional initial) (mandatory lastname)
String patternS = "([A-Za-z]+)\\s+([A-Z]\\.)?\\s+([A-Za-z]+)";
crunchdog
Both of the other answers are perfectly clean and readable. In fact, for complicated and lengthy regex's, your solution is very poor.
oxbow_lakes
I disagree, comments in the actual regex is not clean and not readable. I like mikej's answer where he divides the regex's into subregexs.
crunchdog
+8  A: 

See the post by Martin Fowler on ComposedRegex for some more ideas on improving regexp readability. In summary, he advocates breaking down a complex regexp into smaller parts which can be given meaningful variable names. e.g.

String mandatoryName = "([A-Za-z]+)";
String mandatoryWhiteSpace = "\\s+";
String optionalInitial = "([A-Z]\\.)?";
String pattern = mandatoryName + mandatoryWhiteSpace + optionalInitial +
    mandatoryWhiteSpace + mandatoryName;
mikej
Thanks - this is a useful approach. It also pointed to another idea of using Domain Specific Languages to generate regexes (http://flimflan.com/blog/ReadableRegularExpressions.aspx). (This is actually what I do in my application which has complicated combinations of compsed regexes for scientific data but that is outside the scope of this question).
peter.murray.rust
This is a very clean and neat solution. Although optionalWhiteSpace should probably be mandatoryWhiteSpace? :)
crunchdog
Thanks crunchdog. I think what caught me out is there is actually a limitation in the pattern in the OP in that if we have a name without the middle initial such as Fred Bloggs then we need 2 spaces between the firstname and surname in order to match the two \\s+ in the pattern. I was trying to address this but for now I have edited the answer to make the pattern equivalent to the one in the OP.
mikej
@mikej Thanks - I have added a request to edit the original for anyone who can make it prettier
peter.murray.rust