views:

331

answers:

6

Please excuse my ignorance. I'm dabbling in PHP and getting my feet wet browsing SO, and feel compelled to ask a question that I've been wondering about for years:

When you write an entirely new programming language, what do you write it in?

This probably sounds really silly to all you programmers, for whom I have tremendous respect, but it's a perplexing chicken & egg thing to me. What do you do? Say to yourself Today I'm going to invent a new language! and then fire up... Notepad? Are all compilers built on previously existing languages, such that were one to bother one could chart all programming languages ever devised onto one monstrous branching tree that eventually grounded out at... I dunno, something old?

With my feeble intellect, I find this fascinating... Please, educate me!

+2  A: 

Generally you can use just about whatever language you like. PHP was written in C, for example. If you have no access to any compiler whatsoever, you're going to have to resort to writing assembly language and compiling it to machine code by hand.

Kaivosukeltaja
You don't have to compile machine code. it is the native language of the CPU by definition.
Stu Thompson
True. What I meant to say was "compile the machine code from assembly language or something similar by hand". I could be wrong, but I'm guessing few people just type in the code as binary/hex straight away.
Kaivosukeltaja
It is possible to edit your answer. :)
Stu Thompson
Edited. Thanks, Stu! :)
Kaivosukeltaja
+6  A: 

The most common answer is C. Most languages are implemented in C or in a hybrid o C callbacks and a "lexer" like Flex or YACC. These are languages which are used for one purpose - to describe syntax of another language. Sometimes, when it comes to compiled languages, they are first implemented in C. Then the first version of the language is used to create a new version, and so on. (Like Haskell.)

Amigable Clark Kant
Some languages are written in assembler, like picolisp. (http://blog.kowalczyk.info/article/picoLisp-Arc-before-Arc.html)
Amigable Clark Kant
What about the programs lex/yacc (flex/bison)? Are these considered supplements for creating languages in C?
Dave
Do you have anything to prove the most common answer is C?
RichardOD
I started to go through the list here:http://www.google.com/Top/Computers/Programming/Languages/Open_Source/Then I accidentally closed my editor window at about language 10, and lost motivation to go through. Anyway, about half so far were implemented in C and the rest mostly bootstrapping to themselves.
Amigable Clark Kant
I think you have to mention Lex/Yacc (or alternatives). One does not generally start writing a language in C, but rather with a lexer and a parser which are then supported with C code.
Steve Rowe
+1  A: 

Actually you can write in almost any language you like to. There's nothing that prevents you from writing a C compiler in Ruby. "All" you have to do is parse the program and emit the corresponding machine code. If you can read/write files, your programming language will probably suffice.

If you're starting from scratch on a new platform, you can do cross-compiling: write a compiler for your new platform, that runs in Java or natively on x86. Develop on your PC and then transfer the program to your new target platform.

The most basic compilers are probably Assembler and C.

ziggystar
This "any" language should however support recursive calls. Otherwise implementing a syntax analyzer and a parser is going to be a real challenge.
Developer Art
If you select an unsuited language for a task, it's your own fault. This can happen for any project, not just compilers/interpreters.
ziggystar
+4  A: 

Pretty much any language, though using one suited to working with graphs and other complex data structures will make many things easier. Production compilers are often written in C or C++ for performance reasons, but languages such as OCaml, SML, Prolog, and Lisp are arguably better for prototyping the language.

There are also several "little languages" used in language design. Lex and yacc are used for specifying syntax and grammars, for example, and they compile to C. (There are ports for other languages, such as ocamllex / ocamlyacc, and many other similar tools.)

As a special case, new Lisp dialects are often built on existing Lisp implementations, since they can piggyback on most of the same infrastructure. Writing a Scheme interpreter can be done in Scheme in under a page of code, at which point one can easily add new features.

Fundamentally, compilers are just programs that read in something and translate it to something else - converting LaTeX source to DVI, converting C code to assembly and then to machine language, converting a grammar specification to C code for a parser, etc. Its designer specifies the structure of the source format (parsing), what those structures mean, how to simplify the data (optimizing), and the kind of output to generate. Interpreters read the source and execute it directly. (Interpreters are typically simpler to write, but much slower.)

silentbicycle
+4  A: 

A lot of languages are bootstrapped- that is written in themselves. As to why you would want to do this, it is often a good idea to eat your own dogfood.

The wikipedia article I refer to discusses the chicken and egg issue. I think you will find it quite interesting.

RichardOD
Which is not possible when you're just starting out.
Michael Borgwardt
Yes- obviously. But a lot of languages are written this way once it is possible. I wanted to point this out as nobody else had, and I feel it is an important point.
RichardOD
+1 for using the term bootstrap. It is interesting that you have to compile your compiler twice. The first time is obviously with the bare-bones compiler you have and the 2nd time with the compiler you just built. Say you added optimization to your compiler. The compiler you built can produce code with those optimizations, but it is not itself running the optimized code until you compile it again with the optimizing compiler.
Les
@Les- Yes bootstrapping is an interesting concept.
RichardOD
+1  A: 

Usually with a general-purpose programming language suitable for systems development, e.g. C, Haskell, ML, Lisp, etc., but the list of options is long. Also, usually with some domain-specific languages for language implementation, i.e. parser and lexical analyzer generators, intermediate languages like LLVM, etc. And probably some shell scripts, testing frameworks, and a build configuration system, e.g. autoconf.

james woodyatt