views:

1153

answers:

11

Hey,

I need to write a compiler. It's homework at the univ. The teacher told us that we can use any API we want to do the parsing of the code, as long as it is a good one. That way we can focus more on the JVM we will generate.

So yes, I'll write a compiler in Java to generate Java.

Do you know any good API for this? Should I use regex? I normally write my own parsers by hand, though it is not advisable in this scenario.

Any help would be appreciated.

+8  A: 

Have a look at JavaCC, a language parser for Java. It's very easy to use and get the hang of

MrWiggles
+3  A: 

Go classic - Lex + Yacc. In Java it spells JAX and javacc. Javacc even has some Java grammars ready for inspection.

gimel
+8  A: 

I would recommend ANTLR, primarily because of its output generation capabilities via StringTemplate.

What is better is that Terence Parr's book on the same is by far one of the better books oriented towards writing compilers with a parser generator.

Then you have ANTLRWorks which enables you to study and debug your grammar on the fly.

To top it all, the ANTLR wiki + documentation, (although not comprehensive enough to my liking), is a good place to start off for any beginner. It helped me refresh knowledge on compiler writing in a week.

Vineet Reynolds
+9  A: 

Regex is good to use in a compiler, but only for recognizing tokens. Regular expressions is only good for recognizing regular languages (i.e. no recursive structures).

The classic way of writing a compiler is having a lexer for recognizing tokens, a parser for recognizing structure, an intermediate code generator, an optimizer, and last a target code generator. Any of those steps can be merged, and some skipped entirely, if makes the compiler easier to write.

There have been many tools developed to help with this process. For Java, you can look at

MizardX
A: 

If you're going to go hardcore, throw in a bit of http://llvm.org in the mix :)

snemarch
+1  A: 

I've used SableCC in my compiler course, though not by choice.

I remember finding it very bulky and heavyweight, with more emphasis on cleanliness than convenience (no operator precedence or anything; you have to state that in the grammar).

I'd probably want to use something else if I had the choice. My experiences with yacc (for C) and happy (for Haskell) have both been pleasant.

Jonas Kölker
+1  A: 

Parser combinators is a good choice. Popular Java implementation is JParsec.

stepancheg
+2  A: 

JFlex is a scanner generator which, according to the manual, is designed to work with the parser generator CUP.

One of the main design goals of JFlex was to make interfacing with the free Java parser generator CUP as easy as possibly [sic].

It also has support for BYACC/J, which, as its name suggests, is a port of Berkeley YACC to generate Java code.

I have used JFlex itself and liked it. Howeveer, the project I was doing was simple enough that I wrote the parser by hand, so I don't know how good either CUP or BYACC/J is.

Michael Myers
+3  A: 

I'd recommend using either a metacompiler like ANTLR, or a simple parser combinator library. Functional Java has a parser combinator API. There's also JParsec. Both of these are based on the Parsec library for Haskell.

Apocalisp
A: 

I suggest you look at at the source for BeanShell. It has a compiler for Java and is fairly simple to read.

Peter Lawrey
A: 

http://java-source.net/open-source/parser-generators and http://catalog.compilertools.net/java.html contain catalogs of tools for this. Compare also the Stackoverflow question Alternatives to Regular Expressions.

hstoerr