views:

146

answers:

2

Hello,

Background:

I'm using Checkstyle 4.4.2 with a RegExp checker module to detect when the file name in out java source headers do not match the file name of the class or interface in which they reside. This can happen when a developer copies a header from one class to another and does not modify the "File:" tag.

The regular expression use in the RexExp checker has been through many incarnations and (though it is possibly overkill at this point) looks like this:

File: (\w+)\.java\n(?:.*\n)*?(?:[\w|\s]*?(?: class | interface )\1)

The basic form of files I am checking (though greatly simplified) looks like this

/*
 *
 *  Copyright 2009
 *  ...
 *  File: Bar.java
 *  ... 
 */
package foo
... 
import ..
...
/**
 * ...
 */
public class Bar
{...}

The Problem:

When no match is found, (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) I receive a StackOverflowError on very long files (my test case is @1300 lines).

I have experimented with several visual regular expression testers and can see that in the non-matching case when the regex engine passes the line containing the class or interface name it starts searching again on the next line and does some backtracking which probably causes the StackOverflowError

The Question:

How to prevent the StackOverflowError by modifying the regular expression

Is there some way to modify my regular expression such that in the non-matching case (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) that the matching would stop once it examines the line containing the interface or class name and sees that "\1" does not match the first group.

Alternatively if that can be done, Is is possible minimize the searching and matching that takes place after it examines the line containing the interface or class thus minimizing processing and (hopefully) the StackOverflow error?

A: 

Try

File: (\w+)\.java\n.*^[\w \t]+(?:class|interface) \1

in dot-matches-all mode. Rationale:

[\w\s] (the | doesn't belong there) matches anything, including line breaks. This results in a lot of backtracking back up into the lines that the previous part of the regex had matched.

If you let the greedy dot gobble up everything up to the end of the file (quick) and then backtrack until you find a line that starts with words or spaces/tabs (but no newlines) and then class or interface and \1, then that doesn't require as much stack space.

A different, and probably even better solution would be to split the problem into parts.

First match the File: (\w+)\.java part. Then do a second search with ^[\w \t]+(?:class|interface) plus the \1 match from the first search on the same file.

Tim Pietzcker
Tim, Sorry for the late reply (holiday and all). I will get to try this out later this week and get back to you.
jtsampson
Tim, Although your answer was not the solution for me, it was the best response WRT the regex. Not knowing quite yet how to use StackOverflow yet, I posted my own answer as my response (which I should have written here). - That being said I accept you answer as the most helpful.
jtsampson
A: 

Follow up:

I plugged in Tim Pietzcher's suggestion above and his greedy solution did indeed fail faster and without a StackOverflowError when no match was found. However, in the positive case, the StackOverflowError still occurred.

I took a look at the source code RegexpCheck.java. The classes pattern is constructed in multiline mode such that the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. Then it reads the entire class file into a string and does a recursive search for the pattern(see findMatch()). That is undoubtedly the source of the StackOverflowException.

In the end I didn't get it to work (and gave up) Since Maven 2 released the maven-checkstyle-plugin-2.4/Checkstyle 5.0 about 6 weeks ago we've decided to upgrade our tools. This may not solve the StackOverflowError problem, but it will give me something else to work on until someone decides that we need to pursue this again.

jtsampson