views:

205

answers:

5

In school we were assigned to design a language and then to implement it, (I'm having so much fun implementing it =)). My teacher told us to use yacc/lex, but i decided to go with java + regex API, here is how the the language I designed looks:

Program "my program"
var yourName = read()
if { equals("guy1" to yourName) }
  print("hello my friend")
else
 print("hello extranger")
end
Program End

Well, as you can see, its a pretty basic language =).

I thought I could implement it in a very OOP fashion, like make an abstract class Sentence and then have subclasses like VariableAssignment, IfSentence etc. and have a class Program which is only a bunch of sentences right? And then call an abstract method eval on all Sentences, so my initial approach to complie the language consisted only of two phases:

  1. Identify syntax of seach line
  2. Create the correspondig class for each line

of course, if something goes wrong on any phase Ii could raise an error.

My question is, am I doing it wrong? Should I go over all phases like the theory says (lexical, syntactical, semantical)? Should I continue with my naive two-phase compiler?

+4  A: 

A lot of smart people thought about this, and from your post I take, they came to the conclusion that all the phases are needed.

So if you want your compiler to work, go the way the theory dictates.

If you want to understand, why it dictates the phases, try the short cut. It will probably take a lot longer.


Disclaimer: I have no idea about compiler theory


Another note: You have a problem; You decide to solve it using regexps; Now you have two problems

Jens Schauder
@Jens Schauder, +1 for the "Now you have two problems" quote.
Kelly French
+3  A: 

I won't ask the obvious question of why you're not following the advice of your instructor and using yacc/lex because I know the answer. You wanted to go off and do something that you thought was cool and would help you learn. Unfortunately, that approach was recommended by your professor because as another posted stated, a lot of very smart people before you have explored multiple approaches and spent vast quantities of time trying to find a good solution.

You can make a two-phase compiler work, but you will need to accept that it will never be as good as going through the full process because it's harder to detect errors. A lot harder in fact. In some cases, you won't even be able to tell that there's an error until it's too late. ie: already compiled and attempting to run.

If you want to learn a lot more about it, go with the two phase approach and you will run into the same problems that the people before you ran into. Just be sure to understand that it will take you a lot longer to get to a final solution, you might be docked points on your project, and it might not work right.

That said, you're going to learn more about it than anyone else in the class. If you have the time to spare, I'd do it the way you are now. The knowledge might come in handy down the road. I would also talk to your professor and tell him that you're going to do it another way against his recommendations because you want to have a more thorough understanding. Perhaps he won't knock points off from your project for being ambitious, even if it turns out wrong.

After all, the point of doing projects in college is to learn.

Mike Taber
+1  A: 

If you use regexes to parse each line your language would have a very limited syntax.

You would not be able to parse each line using just a regular expression API if your syntax becomes more complex. Even the if { equals("guy1" to yourName) } would become impossible to parse with regexes if you start adding AND and OR operators, and what would happen if you start supporting escape characters like \n in your string literals?

The Java Regex API would be able to help you with the lexical analyzer, but you would have to write the parser from there. You could take one of several approaches:

  • If you're using Java, you could look at Antlr (which negates the need for writing a lexicall analyzer with Java's regex library), or
  • You could write a recursive descent parser by hand

among others

(also, "Statement" is a synonym for "Sentence" that is more common in compiler texts)

iWerner
A: 

If u really want to dirty ur hands code a recursive descent parser. If you want to understand compiler theory use antlr and concentrate on the principles leaving the implementation for the parser generator. BTW, why would wnat to complicate your life with regex?!

Radu
+1  A: 

If you want to use only regular expressions to parse your language, your language can only be regular. This is a big constriction, for example, arbitrarily deep nesting would be impossible, as you would have to teach your parser each nesting combination separately. I am not sure if building a Turing-complete regular language is even possible.

Svante