What language do they build other languages with?

views:

810

answers:

+23 Q:

What language do they build other languages with?

What language is used to build low level languages like c++ and java?

How could you build the first language with no language?

+28 A:

In the context of compilers, this operation is often called bootstrapping. In particular, see the "Chicken and egg problem" section for a direct answer to your question.

The very first compiler would have been hand-written in assembly language. If your next question is "how was the first assembler written?" then the answer would be that the first assembler was hand-written in binary machine code, possibly with front panel toggle switches. This is undoubtedly a simplification of what really happened, but the concept is the same.

There is also an excellent article titled Reflections on Trusting Trust by Ken Thompson about the risks of using a compiler for a language to build the compiler for that language.

Greg Hewgill 2010-01-10 02:34:42

+1 for mentioning Thompson paper alone. I believe that was his speech at getting the Turing award.

Nikolai N Fetissov 2010-01-10 03:03:30

Also note that for new architectures, a cross-compiler can be built using a compiler from an existing architecture, that then runs on the existing architecture but generates code that runs on the new architecture. That code itself may be a compiler than then runs on the new architecture.

Clifford 2010-01-10 09:21:58

"undoubtedly a simplification" not much of one really. Loaders came before assemblers. Bootstrapping an assembler was the same deal as bootstrapping a compiler. Simple to complex; adding a few features each time. Mercifully, when I started 4 decades ago, most machines came with a sophisticated assembler pre-written by the vendor. Even the lowly PDP-8.

S.Lott 2010-01-10 12:38:55

"Bootstrapping an assembler was the same deal as bootstrapping a compiler. Simple to complex" Or, I suspect, just write the whole damn assembler in hand-constructed machine code. Which you could construct by hand-writing assembler, then hand-calculate the machine code one op at a time, then run it on its own source to make sure it gives the same result. I reckon an assembler can be simple enough to bootstrap in one step like this, especially if the assembly language doesn't have fripperies like named labels.

Steve Jessop 2010-01-10 15:03:35

@Steve Jessop: Assemblers are actually painfully complex. Most assemblers have fancy macro facilities. You don't want to write those in machine code. Early assemblers were 2-pass -- 1 pass to gather symbols and then a pass of code generation. That reflects the boot-strapping. Code generation is first generation. Symbol resolution is the add-on to code generation. Macros are yet another add-on.

S.Lott 2010-01-10 22:34:53

+4 A:

I think the key insight to your question is the notion of boot-strapping. The link will describe how a language can self-host.

It is relatively common in the Lisp community. e.g. Some university classes will use Scheme to write a language subset (this is not a compiler class activity).

That said, many compilers are written in other languages. For example, PUGS (Perl 6) is written in Haskell. Ruby is available in C or Java (as JRuby).

Michael Easter 2010-01-10 02:35:17

compiler dogfooding

kenny 2010-01-10 02:36:49

+3 A:

there are a couple options, you can implement the entire language in a language available on the target host, like C or Ocaml, whatever it may be. Once you have that implementation, you can write a compiler / interpreter in the language itself, build it, and now the language runs itself. this process is called 'bootstrapping'.

jspcal 2010-01-10 02:36:03

but the question is how was the first language written?

weng 2010-01-10 02:39:14

It was written directly in assembler, which is code the processor understands natively.

stealthdragon 2010-01-10 02:47:44

Processors do not understand assembly language, they understand machine code. You still need a program called an an 'assembler' to create machine code from assembly code (although there is a one-to-one relationship between machine instruction and assembler mnemonic). Originally you would have had to set memory addresses and content using binary switches or hex keypad, or burn them into a ROM. However, when bootstrapping a new architecture these days, you'd use a cross-compiler or assembler running on an existing architecture.

Clifford 2010-01-10 09:17:01

+6 A:

Much of this kind of thing is done in C.

The first C compiler was not written in C; it was PDP-11 assembler. Other early C compilers have been written in various assembler languages.

But all subsequent C compilers actually are written in C, based on an early "Portable C Compiler". Yes, it's circular. But the version x compiler can be used to build the version x+1 compiler.

S.Lott 2010-01-10 02:38:17

PCC is actually back in active development - http://pcc.ludd.ltu.se/

Nikolai N Fetissov 2010-01-10 03:05:12

Are you sure the first C compiler wasn't written in C? (Or I suppose strictly, in "C with a few features missing"). I can't quite tell from this article: http://cm.bell-labs.com/cm/cs/who/dmr/chist.html, but Dennis Ritchie says that B was already self-hosted before he started developing it into C, and he specifically mentions self-hosting as a desirable feature. If he went back to PDP-11 assembler, rather than developing B gradually into early C, do you know why? The innovation of PCC wasn't that it was written in C, but that it cross-compiled well.

Steve Jessop 2010-01-10 15:13:06

Good point. My understanding was the B was the template for C; not that the B compiler was modified to create the C compiler. However, it's entirely possible I have that completely wrong and the B compiler was morphed to create C.

S.Lott 2010-01-10 22:32:38

@Steve Jessop: "Expert C Programming" by Peter van der Linden gives expln: "A typeless language proved to be unworkable when development switched in 1970 to the newly introduced PDP-11. This processor featured hardware support for datatypes of several different sizes, and the B language had no way to express this. Performance was also a problem...Ritchie capitalized on the more powerful PDP-11 to create "New B," which solved both problems, multiple datatypes, and performance. "New B"—the name quickly evolved to "C"—was compiled rather than interpreted, and it introduced a type system..."

William Knight 2010-03-16 18:29:03

+11 A:

You don't build a language, but you build a compiler or an interpreter ... and for this you can choose any language even the language you want to compile ...

The first self-hosting compiler — capable of compiling its own source code in a high-level language — was created for Lisp ... Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. http://en.wikipedia.org/wiki/Compiler

wj 2010-01-10 02:40:13

+1 for the most straightforward answer to the original question. the links are great.

Otaku 2010-01-10 06:37:35

This is even true for weird languages like BCX. (A BASIC to C compiler from way back) The source for the compiler was written in BCX.

George Edison 2010-01-10 08:23:00

+1 A:

There is no such thing as "no language". The central processing unit operates on a series of signals to which we refer as bits or ones and zeroes (technically, changes in the electrical current flow). In th 50s, coding was done directly in what the CPU could "understand" and the pace at which programming was done was up to around 20 assembler commands per day.

mingos 2010-01-10 07:15:22

Typically another machine or another language is used to write the first assembler and the first compiler.

As long as a working computer and a working language is available, even if rather different, the problem can be solved in two steps.

Write target language x for computer y in language z on computer (urk) a.
Write target language x for computer y in language x. Now a single compile on a will produce a translator that can run on y, and the second compile can then be on y with a fully-bootstrapped system.

The problem becomes simpler if the languages or machines don't differ.

Bootstrapping can also be done incrementally, and perhaps this was more common 50 years ago.

Write a more powerful virtual machine (perhaps a stack machine or something with strings) in machine code
Now, writing in the VM's bytecode, write something closer to the language

Something like Forth might make a good intermediate step.

DigitalRoss 2010-01-10 19:29:49

+1 A:

As mentioned by the other posters, you can write a language in practically any language, and often one of the first programs written in a language is a compiler for the language itself.

However, there are some languages that were specially developed for writing computer languages - namely lex, yacc, flex, bison (updated versions of lex and yacc). These allow you to represent the lexical and grammatical specification of some languages (I believe LLR, or LALR) in a form that can be compiled into an efficient language recognizer.

You do still have to write other parts of the language compiler/interpreter yourself, i.e. semantic analysis, code generation.

See

http://dinosaur.compilertools.net/

Larry Watanabe 2010-01-10 22:22:05

ansaurus

tags:

views:

answers:

What language do they build other languages with?

related questions