views:

292

answers:

6

I was having this discussion with my friend and realized this might be the best place to ask this question ?

How is a new language born ? This new language NEW must be written in some old language OLD (eg C++ was written in C in initial stages), or how is this created ?? And, how does this language NEW can work itself if there is no compiler for it ? So, there must be some compiler for it ? Who writes compiler for it ?

So, how does all this work together, new language and its compiler, relation of new language with its old base language ?

+4  A: 

You write the compiler in an implementation language until such time that the compiler can begin to compile enough of the new language to be used to implement the rest of the new language.

That's how it works.

Edit: Just to clarify, the commentors on this answer are also correct. The compiler doesn't HAVE to be written in the new language unless you want to. As said, some don't go that route and stay with the original implementation language.

Robert Rouse
Yup that's it in a nutshell. There are those that don't believe that it's a real language unless it can be used to write it's own compiler :)
wcm
To add to Scyllinice's answer: Not all languages can go this route, of course - a lot of LISP variations are pure interpreters without the ability to create an executable per se, and a LISP compiler can be written in pretty much any language. The OLD and NEW languages don't really have to have any real connection to each other.
Mike
though, you don't necessarily have to migrate off of the original implementation language. For example, Tcl and many (most?) scripting languages use C or C++ implementation language long after the new language has become mature and stable.
Bryan Oakley
+2  A: 

Bootstrapping is a term used in computer science to describe the techniques involved in writing a compiler (or assembler) in the target programming language which it is intended to compile. This technique is also called self-hosting.

kitchen
+1  A: 

You may want to read up on programming language design and compiler design:

http://dragonbook.stanford.edu/
Bootstrapping (compilers)
http://en.wikipedia.org/wiki/Programming_language
http://www.paulgraham.com/langdes.html

Or, take a course or three at your local university.

Ether
+1  A: 

The heart of any language is the linker and compiler, the compiler which converts source code into intermediary, very close to machine code, code. From this point, linkers are used to attach it to other binaries such as libraries, etc. After the binaries are linked to all logical pieces, they become an executable file in machine code (or translatable intermediary code as it is with .NET/Java).

The most translating from "human" english happens in the compiler, and there are great articles on how this is done... but to most of this it is in the realm of the supernatural, as the organizational skills required to write a working compiler are immense.

You can see the surface level sorts of translations and get a closer look at how compilers work by looking at language definitions (Bjarne Stroustrup's "The C++ Programming Language", Microsoft Press's "The C# Programming Language"), where both the appendixes and peppered throughout are lexical pieces, or rules which the compiler will use to translate your words into machine code in a very logical way.

I highly recommend reading the language definition of your favorite programming language if you wish to understand more, also the wikipedia article on compilers will give you a broader understanding.

Eugarps
I don't agree with the statement "the organizational skills required to write a working compiler are immense". Back when I was just a handful of years out of college and never having taking a compiler class, I was able to create a special purpose language using lex and yacc. While difficult, it was far from being immensely difficult. It was actually quite rewarding.
Bryan Oakley
@Bryan: It depends. With modern compiler-generation tools it can be reasonably easy to create a small language (like many domain specific languages), but @Sprague is assuming something meatier. Once you add in all the optimization, code generation, etc. needed for a major programming language like Python, Java, or C# the work can get very demanding. Then there are ancillary tasks like VM design, GC algorithms, standard libraries, ...
Jim Ferrans
A: 

A language is (generally) nothing but a specification. A compiler or interpreter of a language can be written in any language of your choosing. The first were in machine code since that's all we had. Then came assembler, then other languages like C. Since that time C (and C++) have remained popular choices for the implementation of a language. C and C++ are note the only choices, however.

It's also worth pointing out that often a language can be implemented with a specialized language (or languages) such as yacc and lex. These are domain specific languages specifically designed to make it easy to create compilers based on a specification. This takes the drudgery out of hand-coding a lot of stuff that can easily be generated by a computer. You take the specification, run it through these tools and out pops the code to implement your language. Yacc stands for Yet Another Compiler-Compiler. It compiles the specifications for a compiler and generates a compiler.

Other posters suggest that once a language is robust enough the compiler can be ported to itself but this is not necessary. Many languages were written a decade or more ago in C and continue to be implemented in C today.

Bryan Oakley
+1  A: 

Great question!

  • Sometimes the compiler for the new language is written in an old language.

  • If the compiler for new language N is written in N, there are many strategies, all of which involve finding some way to run a program in language N when you don't yet have a compiler.

    1. Write an interpreter for language N, say in C (really the language of your choice), then use the interpreter to interpret the compiler compiling itself.

    2. Write a really horrendous compiler for N, say in C, then use that to compile the first version of the compiler.

    3. Compile the first version of the compiler into assembly code or C code, usually by hand.

My favorite is strategy #1, but they all work.

If you want to see a solution to this problem explained in depth, look at Andrew Appel's short paper Axiomatic Bootstrapping: A Guide for Compiler Hackers, which is free from a Princeton web site. This paper is very mathematical, but in the related-work section you will find references to older papers, including ones that show the bootstrapping process using T-diagrams, which many people find very intuitive.

Norman Ramsey