I have a Java source code that I need to interrogate and apply security policies [for e.g. applying CWE] I have couple of ideas, for starters using AST and then travel thru the tree. Others include using regular expression. Are there any options other than AST or regex that I could use for such process.
An AST is a good choice, much better than regular expressions.
There are numerous Java parsers available. ANTLR's java grammar is one example.
You can also adapt the source code of the javac
compiler from OpenJDK.
Some static analysis tools like PMD support user-defined rules that would allow you to perform many checks without a lot of work.
Many static source code analysis (SCA) tools use a collection of regular expressions to detect code that maybe vulnerable. There are many SCA tools for Java and I don't know the best open source one off hand. I can tell you that Coverity makes the best Java SCA tool that i have used, its much more advanced than just regular expressions as it can also detect race conditions.
What I can tell you is that this approach is going to produce a lot of false positives and false negatives. The CWE system indexes HUNDREDS of different vulnerabilities and covering all of them is completely and totally impossible.
There are a number of pre-existing tools that do some or all of what you are asking for. Some on the source code level, and some by parsing the byte code.
Have a look at - CheckStyle - FindBugs - PMD
All of these are extendable in one way or another, so you can probably get them to check what you want to check in addition to the many standard checks they have
You either want to get an existing static analysis tool that focuses on the vulnerabilities of interest to you, or you want to get a tool with strong foundations for building custom analyses.
Just parsing to ASTs doesn't get you a lot of support for doing analysis. You need to know what symbols mean where encountered (e.g., scopes, symbol tables, type resolution), and you often need to know how information flows (inheritance graphs, calls graphs, control flows, data flows) across the software elements that make up the system. Tools like ANTLR don't provide this; they are parser generators.
A tool foundation having this information available for Java is our DMS Software Reengineering Toolkit and its Java Front End.