views:

489

answers:

9

please i need some resources to begin (i am a cs student)

A: 

Lisp / Scheme was what we were tasked with using back in uni.

They leant themselves quite well to the task.

Dan

Daniel Elliott
+1  A: 

While I once had a text book titled 'modern compiler implementation in Java' I think the professionals still use C. Other than to prove that their language can compile itself.

beggs
+6  A: 

The answer can be very subjective here. But I'd recommend using ANTLR if you want to write a parser. Currently ANTLR supports C, C#, ActionScript, JavaScript, and Java targets. From my experience the Java version is really stable to use and has been used in many powerful opensource projects namely Drools and Hibernate.

jpartogi
+4  A: 

Does it need to be written in a programming language? Or can you use Flex and Bison?

chotchki
I was about to suggest Flex and Bison as well. :-)
Alan Haggai Alavi
Parser generators don't remove the need for a programming language. They generate a parser in a particular language. The traditional Flex/Bison (LEX/YACC) language is C.
Brannon
@Brannon +1 for your comment; you still need a programming language. However I'd like to add that Flex/Bison or Lex/Yacc are pretty outdated. If you've used something like Coco/R and you go to Lex/Yacc, you'll find yourself desperate for features, not the least of which are lookaheads greater than one.
Imagist
A: 

Do you want to write a parser for a general purpose language? In this case writing (and bootstrapping) in the target language is clearly recommended. You should eat your own dogfood.

Mnementh
I would say, "recommended in many cases", not "clearly recommended". "General-purpose language" is kind of a misnomer, as there isn't any language that is appropriate for *all* tasks. I wouldn't write a web application in C and I wouldn't write an operating system in Python. It's not that you can't do either of these things; it's simply that it wouldn't be appropriate to do so. If you accept this logic, then it makes sense that one might be writing a general-purpose language that isn't appropriate for parsing and therefore shouldn't be used to parse itself.
Imagist
Here's a nice example: the *entire* JavaScript interpreter of the STEPS project is only 170 lines of OMeta source code, and took one person one afternoon to write. In the Narcissus JavaScript interpreter, the parser *alone* is more than 1000 lines of JavaScript, with another 1000 lines for the AST visitor. So, using JavaScript to implement JavaScript is 10 times more verbose than using OMeta.
Jörg W Mittag
But you flush out amass of bugs, if you really use your parser/compiler. And you really use it by implementing it in this language itself.
Mnementh
A: 

If you implementing a compiler from the ground up, most programming languages are up to the task. (I've even know of compilers / parsers written in Fortran IV and COBOL, though I wouldn't recommend trying that!)

But if the language you are trying to implement has even a non-trivial grammar, you would do better using a lexer generator and/or a parser generator to implement the front end. You'll get a much faster and more reliable parser.

So, on that basis, suitable programming languages for which a decent parser generator is available. There is a page on Wikipedia that compares a large number of parser generators. I didn't realize there were so many!

Stephen C
A: 

If your aim is to learn the techniques behind parsers (and tokenizers), maybe it's better to write one yourself from scratch. You can do this in most programming languages, so you can pick one you're comfortable with.

A while ago, I wrote a series of blog posts that show how easy it is to write a parser for a small fictitional BASIC-like programming language, in C#. I don't want to spam here, so I won't provide a direct link, but if you visit blog (see my profile) and go to the bottom, you can find a link "Writing a parser" in the "my posts"-section.

Tommy Carlier
+2  A: 

You'll want to look into parser generators. If you're a CS student then you'll probably want to take a look at the Dragon Book: http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools.

It would probably be easiest to build a parser using C# or Java, since you won't have to worry about things like memory management, etc and can focus on the grammar.

A good C# parser generator is GPPG: http://plas.fit.qut.edu.au/gppg/.

Brannon
Can somebody explain why this has been downvoted? The Dragon Book is an excellent resource, and c#/java happens to follow the word of the question. Although I think the question itself is stupid.
gimpf
+1  A: 

Parsers and compilers are two separate problems. For example I might write a compiler in C, but I would never write a parser in C (I would use a parser generator). For very simple parsers where speed isn't a high priority, I might hand-code the parser in Perl or Python, which have good text-manipulation facilities. But for anything beyond a very basic parser, I would use some sort of parser-generation tools. The most commonly-used ones are ANTLR, Coco/R, and Lex/Yacc and the GNU implementation Flex/Bison. My personal preference is Coco/R, but ANTLR seems to be more popular these days.

If you're writing a general-purpose programming language, you may want to consider writing it in itself. There are many benefits to this, including portability (people only have to port the first version of the language) and demonstration of capabilities (parsing is a hard problem, so if it can be done in your language that's a testament to your language). If your language is interpreted, this may not be appropriate for performance reasons.

Imagist