My goal is to find the package (as string) of a Java source file, given as plaintext and not already sorted in folders.
I can't just locate the first instance of the keyword package
in the file, because it may appear inside a comment. So I was thinking about two alternatives:
- Scan the file word-by-word, maintaining an "inside-a-comment" flag for the scanner. The first time the
package
keyword is encountered while not inside a comment, stop the scanning and report the result. - Use a regex - should be theoretically possible because block comments do not next in Java, but I tried making such a regex and it turned out to be quite complicated - for me, at least.
Another difference between the two approaches is that when scanning manually I can stop the scan when I can be certain the package
keyword can no longer appear, saving some time... and I'm not sure I can do something similar with regexes. On the other hand, the decision "when it can no longer appear" is not necessarily simple, though I could use some heuristic for that.
I would like to hear any input on this problem, and would welcome any help with the regex. My solution is written in Java as well.
EDIT: to those suggesting actually parsing the file - it's definitely a viable option, thank you, but it feels a bit of an overkill for me to parse the whole file for just the package. I'll do it if there's no simpler alternative.