My goal is to analyze java source files to find line numbers containing non-comment code. Since StreamTokenizer has slashStarComments() and slashSlashComments(), I figured I'll use it to filter out the lines that have only comments and no code.
The program below prints the line numbers and any string tokens on that line, for each line that has something that's not a comment.
It works most of the time, but sometimes not... For example, line numbers get skipped every now and then begining with the comment line 144 in the following source file from log4j, Category.java: http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/Category.html StreamTokenizer sometimes just seem to skip some lines at the end of javadoc comments.
Here's my code:
import java.io.FileReader; import java.io.IOException; import java.io.Reader; import java.io.StreamTokenizer; public class LinesWithCodeFinder { public static void main(String[] args) throws IOException { String filePath = args[0]; Reader reader = new FileReader(filePath); StreamTokenizer tokenizer = new StreamTokenizer(reader); tokenizer.slashStarComments(true); tokenizer.slashSlashComments(true); tokenizer.eolIsSignificant(false); int ttype = 0; int lastline = -1; String s = ""; while (ttype != StreamTokenizer.TT_EOF) { ttype = tokenizer.nextToken(); int lineno = tokenizer.lineno(); String sval = ttype == StreamTokenizer.TT_WORD ? tokenizer.sval : ""; if (lineno == lastline) { s += " " + sval; } else { if (lastline != -1) System.out.println(lastline + "\t" + s); s = sval; } lastline = lineno; } } }
Does anyone understand why StreamTokenizer behaves as it does?
Any alternative ideas on how to filter out the comments would be appreciated.