views:

265

answers:

7

As a compiler, other than an interpreter, only needs to translate the input and not run it the performance of itself should be not that problematic as with an interpreter. Therefore, you wouldn't write an interpreter in, let's say Ruby or PHP because it would be far too slow.

However, what about compilers?

If you would write a compiler in a scripting language maybe even featuring rapid development you could possibly cut the source code and initial development time by halv, at least I think so.

To be sure: With scripting language I mean interpreted languages having typical features that make programming faster, easier and more enjoyable for the programmer, usually at least. Examples: PHP, Ruby, Python, maybe JavaScript though that may be an odd choice for a compiler

  • What are compilers normally written in? As I suppose you will respond with something low-level like C, C++ or even Assembler, why?

  • Are there compilers written in scripting languages?

  • What are the (dis)advantages of using low or high level programming languages for compiler writing?

+1  A: 

Most compilers are written in C or C++. Even today, the performance of a compiler matters. When you have to compile a 900-file project, it makes a hell of a difference if it takes 2 minutes or 20 minutes.

Some compilers are written in scripting languages (one example that comes to mind is Pyjamas - a compiler from Python to Javascript, written in Python), but the vast majority of industrial-strength compilers are written in in C & C++.

Eli Bendersky
The silly thing is that it is all too easy to write terribly-performing string handling code in both C and C++. Of course this is not the fault of those languages really, but they're not magically faster either. Getting a high-speed compiler is principally about intelligent use of data structures and algorithms. (Funny, that's what almost all high-performance programming is about!)
Donal Fellows
+4  A: 

Most compilers are written the the language they target (bootstrapping).

There are of course numerous exceptions.

Aram Hăvărneanu
Why the negative votes when this is the objective, verifiable truth?
Aram Hăvărneanu
I didn't downvote, but it isn't true for (for example) FORTRAN and COBOL, to name but two. If you have stats to back up your assertion, please provide them.
anon
Another example would be the languages that come with GNU - like the Ada compiler, or `gcj` written in C like the rest of the GCC toolset
Eli Bendersky
You must have not read the full post. There are numerous compilers written in other language then they target, but the majority of compilers for mainstream, _compiled_ languages are not like this.Most mainstream implementations of C, C++, Haskell, Java, Erlang, OCaml, Oberon, Pascal, Delphi are written in that languages.
Aram Hăvărneanu
It isn't the 'objective truth'. It is in fact basically a self-contradiction: 'most' with 'numerous exceptions'? If you disagree, verify it. Provide some examples of Cobol compilers written in Cobol. PHP compilers written in PHP. RPG compilers written in RPG. Javascript compilers written in Javascript. JIT compilers written in Java, or JVM byte-code.
EJP
If 51% percent of compilers are written in the language they target, it is a majority, but the rest of 49% are still numerous exceptions, there is no contradiction.I have provided examples of compilers written in language they target, for _compiled_, mainstream languages.
Aram Hăvărneanu
@Aram "If 51% ..." - yes, but you have given no evidence the figure isn't 49%.
anon
Erlang doesn't quite fit there. Much of Erlang's LIBRARY is written in Erlang, but the VM at the core is in C. From my Erlang root directory `find -name "*.c" | wc` gives me 489 C source files. The same for `.erl` files is 2506 files. That's not a perfect approach, naturally, but it gives a good feel. A quick glance at the distribution of said files puts most of those C files in the BEAM and HIPE implementations -- the very core of the language runtime.
JUST MY correct OPINION
The bootstrapping is a relevant technique in computer science, the process is well defined and enforced. I think this answer doesn't deserve the downvotes
Eineki
@ttmrichter you are right, the Erlang VM that interprets the bytecode, is C, just as the Java VM that interprets Java bytecode is C or the .NET VM that interprets MSIL bytecode is C, but the compiler that translates erlang source code into erlang bytecode is erlang, just as the compiler that translates java into java IL is Java. That is what I meant, since practically that is the compiler.
Aram Hăvărneanu
Ah, yes, I get your drift there. The scanner/parser/treewalker/optimizer/whatever is mostly written in Erlang (because Erlang is GOOD at that kind of stuff) while the runtime is written in C. Yeah, if you go that split, Erlang is written in Erlang for the most part.
JUST MY correct OPINION
A majority isn't 'most', and you've provided no evidence. I consider Cobol a mainstream language. There are plenty of examples of compilers which are and which aren't bootstrapped. If you have facts about 'most' please provide them.
EJP
Given the number of compilers that support multiple input languages (e.g. the GCC suite has C,C++,ObjectiveC,Fortran,ADA,Java) yet is implemented in one language ( C ), it is reasonable to conclude a compiler is implemented in some language that meets various criteria of suitably efficient (in execution time / development time), portable, and familiar to the implementers. Another examples is LLVM/Clang - supports -(C,C++,Objective-C) and is implemented in C++. Bootstrapping is relevant solely to the case of a single language implementation without usable development tools on the target.
grrussel
@grrussel: FWIW, they could probably drop the support for Java and virtually nobody would miss it. The way that the Java community usage of the language has evolved is completely different from the way that gcj supports it.
Donal Fellows
@Aram: Java's core is implemented in C++, but that's only a very small part of things; the JIT engine doesn't go from Java to C++, but rather directly to native machine code.
Donal Fellows
A: 

They're mostly written in a reasonably high-level language (C/C++). However, with modern hardware it's perfectly fine to have a compiler written in managed language (C#/Java), in functional language (Haskell) or, better yet, managed functional language (Nemerle).

Functional languages benefit from a technique called pattern matching, which makes handling parse trees/ASTs much simpler.

The real compiler-fu is writing a compiler for a language in that particular language (a process called bootstrapping).

Anton Gogolev
A: 

Compilation is one of the most computationally intensive things you can do on a computer or as Joel Spolsky puts it:

Writing code in a compiled language is one of the last things that still can't be done instantly on a garden variety home computer.

Hence you wan't the compiler to be as fast as possible which makes C and C++ natural choices.

Andreas Brinck
Writing code or compiling it? I've seen some ultra-fast C compilers. (C++ is more challenging though.) The actual authoring of the code is slow though; it's a creative and human process and so not truly possible to convert to a simple algorithm. (The algorithms we have do transformation of one language into another, i.e., compilation.)
Donal Fellows
A: 

There's a native Python compiler for Python called pypy.

Christian
A: 

There are specialised programming languages for implementing compilers efficiently, e.g.:

http://www.meta-alternative.net/mbase.html

Also: Irony, JetBrains MPS, and some more.

Functional languages in general are quite efficient in this area, especially languages with algebraic data types, pattern matching an currying, for example - Haskell, ML (F#, OCaml), Nemerle, Scala.

SK-logic
A: 

The javac compiler from the SUN / Oracle JVM is written in Java; as is the compiler of Java used within the Eclipse IDE for the background compilation as you edit. Compilers for many functional languages are often written in that language, as functional languages are typically quite suited to writing compilers. Compilers for restricted languages (e.g. GPU programming such as GLSL/OpenCL) will not be written in languages executable on GPUs.

One fundamental issue is that the language compiled by a given compiler may not be a good language for implementing a compiler; I don't know of anyone writing compilers for FORTRAN in FORTRAN.

In essence, the implementation language of a compiler may or may not be an input language to that compiler, depending on the suitability of the languages involved and a host of other criteria from development time, required runtime performance, tool availability and developer familiarity.

grrussel