views:

65

answers:

3

My goal is to find the package (as string) of a Java source file, given as plaintext and not already sorted in folders.

I can't just locate the first instance of the keyword package in the file, because it may appear inside a comment. So I was thinking about two alternatives:

  • Scan the file word-by-word, maintaining an "inside-a-comment" flag for the scanner. The first time the package keyword is encountered while not inside a comment, stop the scanning and report the result.
  • Use a regex - should be theoretically possible because block comments do not next in Java, but I tried making such a regex and it turned out to be quite complicated - for me, at least.

Another difference between the two approaches is that when scanning manually I can stop the scan when I can be certain the package keyword can no longer appear, saving some time... and I'm not sure I can do something similar with regexes. On the other hand, the decision "when it can no longer appear" is not necessarily simple, though I could use some heuristic for that.

I would like to hear any input on this problem, and would welcome any help with the regex. My solution is written in Java as well.

EDIT: to those suggesting actually parsing the file - it's definitely a viable option, thank you, but it feels a bit of an overkill for me to parse the whole file for just the package. I'll do it if there's no simpler alternative.

A: 

If you have compileable code, the name of the package is identical to the directory (relative to your root source folder) where the source file is located. So you probably don't need to parse the source code.

stacker
+1  A: 

You could use an actual java source parser, like javaparser. It gives the correctly parsed java file without needing to reinvent a java parser or using a poor man's parser (regex.)

The only downside I see is that perhaps you want to stop parsing as soon as you've found the package, and avoid parsing the remainder of the file. There are various, somewhat hacky, ways you could achieve this, but I recommend that you meausre whole-file performance before thinking about this.

mdma
+4  A: 

I solved this problem by using a java parser. For my purpose javaparser was the best fit.

CompilationUnit cu = JavaParser.parse( file );
String packageName = cu.getPackage().getName().toString();
tangens