views:

115

answers:

4

I would like to perform checking on the following:

VALID LINES;
/**/ some code
*/ some code   /** dsfsdkf sd**/

NOT VALID LINES;
/**/ //some code
*/ /***/ //somecode

So basically if there is a line of code outside a comment it is valid, otherwise not.

What would be the best way to tackle this kind of validation?

Note:
For */ I assume that the /* has been opened some lines before.

+3  A: 

You could just use a Java parser to parse the file properly.

dty
A: 

You could build a custom parser with something like Javacc and then use it to parse the file.

npinti
+3  A: 

This should be quite fast I believe.

import java.io.*;

class Test {

    public static void main(String[] args) throws IOException {
        StringBuffer buf = new StringBuffer();
        BufferedReader r = new BufferedReader(new FileReader("src/Test.java"));

        final String nl = System.getProperty("line.separator");
        String line;
        while (null != (line = r.readLine()))
            buf.append(line).append(nl);

        for (String code : buf.toString().split("(?ms)/\\*.*?\\*/|//[^\\n]*"))
            System.out.println(code);
    }
}

If you read up a bit on the internals of reg-exps you'll realize that regular expressions are quite efficient once the underlying automaton has been compiled and minimized (at least for simple regular expressions like the one above). No matter how you implement your algorithm, it would still need to do roughly the same work that the reg-exp engine does in this scenario anyway.

(If you look at the String.split method, you'll note that the internal regular expression is compiled into a Pattern once and for all.)

aioobe
@aioobe +1 good answer
c0mrade
Do you have to omit the braces in the while and for to feel you're doing it fast? ;)
OscarRyz
A: 

I am trying to read all lines of code from a .java file, excluding comments.

(Are you trying to extract the code, or simply count the lines of code?)

A simple line-by-line approach you is probably not going to be entirely accurate. For example consider this:

/*  The next line is wrong:
    res = 1 / 0;
 */

A line-by-line analysis will conclude that the second line is code ... when it is actually comment.

Another problem with trying to use regexes is that there are all sorts of edge cases. For example:

System.err.println("/* hello mum ");
System.err.println("*/");

Or

\u002f* This is a comment *\u002f

I'm not saying that regexes cannot be used. I'm just saying that your code will be simpler and probably less fragile if you use proper Java parser.

Stephen C