views:

1237

answers:

17

I just finished to read "Coders at works", a brilliant book by Peter Seibel with 15 interviews to some of the most interesting computer programmers alive today.
Well, many of the interviewees have (co)invented\implemented a new programming language.
Some examples:

  • Joe Armstrong: Inventor of Erlang
  • L. Peter Deutsch: implementer of Smalltalk-80
  • Brendan Eich: Inventor of JavaScript
  • Dan Ingalls: Smalltalk implementor and designer
  • Simon Peyton Jones: Coinventor of Haskell
  • Guy Steele: Coinventor of Scheme

It is out of any doubt that their minds have something special and unreachable, and i'm not crazy to think i will ever able to create a new language; i'm just interested in this topic.

So, imagine a funny\grotesque scenario where your crazy boss one day will come to your desk to say "i want a new programming language with my name on it..take the time you need and do it", which is the right approach to studying this fascinating\intimidating\magic topic?

What kind of knowledge do you need to model, design and implement a brand new programming language?

+1  A: 

You need to know theory of formal languages and grammars in the first place to know what is context-free grammar (most programming languages have context-free grammar). Then, you need to know something about compilers. It's good to know tools like Lex and Yacc and things like Backus-Naur form.

EDIT: I think MSc degree in Computer Science is a good starting point;)

el.pescado
I have a CS degree and i know it's a good starting point :); but, to invent a new language, you need more.Probably a Phd on the subject, and time....
systempuntoout
@system: formal qualification (like a piled higher and deeper) have *nothing* to do with it. Not even the bacheloriate is necessary. It's the deep knowledge of what is wrong with existing language and what is possible for fixing it. The long study involved in getting an advanced degree is one route to obtaining what is needful, but not the only one.
dmckee
@dmckee Uhm, i'm talking about time.At university\phd you have time to learn and focus.Now working 10hours a day, imho is more difficult to start a project like this.
systempuntoout
@dmckee: even so, a university compiler course will come in very helpful if you actually want it to be a working language and not just a specification.
Callum Rogers
@el.pescardo: I don't think this answer is true. Many years ago I decided to create a domain specific language (though that term hadn't been invented then, I don't think). I knew nothing about formal language theory, grammars, etc. I just dug in to the lex and yacc man pages and started hammering away. It was a fabulous experience and taught me a lot. All I really had was a burning desire to do it.
Bryan Oakley
+6  A: 

Knowing more than one existing programming language would be a good start. Even if some of them aren't programming languages per se, the different ways that they do things would be helpful for deciding what you do/do not want your language to do.

Blair McMillan
I'd suggest knowing a variety of existing programming languages, not just two. C, Lisp, Haskell, Prolog, and Perl should make a good mix to start with, although there's obviously many other excellent mixes.
David Thornley
Wasn't that what I was saying?
Blair McMillan
@Blair: More or less, although "more than one" didn't seem strong enough to me.
David Thornley
+2  A: 

Writing a programming language, by itself, is not all that out of reach. Creating a good one is much harder. I think the best thing to learn first would be a good understanding of the history of programming languages — what's been tried, what's worked, what's failed. Armed with that, you need to know how your language is meant to be used so you know what design suits that best.

Chuck
+13  A: 

For starters you would need to know exactly what it should be able to do differently (or better) (and why) then all the languages which are available today.

Some of our best tools available today arguably grew out of frustration with the tools at the time they were conceived / invented.

ChristopheD
@Christophed: I think "need" is a bit strong here. I think it's perfectly valid for someone to try to design a language out of curiosity or as a learning experience. It doesn't necessarily have to be better than all other languages out there.
Bryan Oakley
+5  A: 

You need to know that your language is almost certainly doomed to obscurity and failure. I would guess that 99% of programming languages are never used by anyone except their author. If you can live with this, developing one is fun. Speaking as an author of several (doomed, obscure) myself.

anon
Interesting, could you expand on which languages those are (if they are `googleable`)...
ChristopheD
@ChristopheD They included a FORTH like language, a text adventure writing language, and a C-like language. All vanished without trace. My latest language that no-one is using is an attempt at declarative data generation, which also hasn't worked out too well - this is available for inspection though at http://code.google.com/p/csvtest/
anon
Brilliant answer, thank you!Could you elaborate ,from you experience, what are the most important things to study and which is the best approach to this subject?
systempuntoout
@systempuntoout "read a lot of books, be clever, and get lucky" would be my approach. I would also add "stop web surfing and start coding" as a codicil.
anon
@Neil Butterworth: Did your text adventure language at least get to the IF archive?
David Thornley
@David Afraid not. This was back in the 80s (it was the project I used to teach myself C++) and this interweb thing didn't really exist then.
anon
@Neil Butterworth: The question isn't "what must I do to create a successful language". It is more simply "what must I learn in order to be able to create one". Telling someone not to learn how to do something because the project will be doomed doesn't help answer that question.
Bryan Oakley
+1  A: 

Domain specific knowledge could help to design a language to solve problems that arise in
that domain. A big collection of use cases, problems and solutions so that you have a good picture
of the space problem. That kind of knowledge is needed, how to actually implement the language
is of secondary priority IMHO.

Nick D
+29  A: 

I too think that writing programming languages shouldn't be something you take on as an attempt to create the next Erlang or JavaScript. Yet, I find it to be a great exercise for the mind and once you start thinking about languages a lot, you find that:

  • You start realizing what's wrong with existing languages.
  • You discover what's great with existing languages.
  • It becomes even easier to learn new ones.

For my own part I've implemented a subset of JavaScript with a few improvements and also another language with a single datatype (the bit) and a single operator (nand) for proving low-level ideas, as well as a couple of DSLs for templating and more specific code generation.

So, on to what I think you should read about:

  • Lexer: Transforms a stream of characters to a stream of tokens ("class", "int", "{" and "++" are typical tokens)
    • Requires knowledge in: Regular expressions.
    • Implementations: There are lexer generators in most languages; Lex for C, Alex for Haskell, ANTLR or JLex for Java, GPLEX for C#, etc.
  • Parser: Transforms a stream of tokens into an abstract syntax tree (AST).
    • Requires knowledge in: ASTs, finite state machines, context-free grammars, Backus-Naur Form.
    • Implementations: Just like with lexers, there are parser generators in most languages: Yacc for C, Happy for Haskell, ANTLR or CUP for Java, GPPG for C#, etc.
  • Generator: Transforms an AST describing your language to an AST of the target language.
    • Requires knowledge in: The target language and/or platform (assembler, C, JVM, CLI or another of your favourites).
    • Implementations: Compared to the other steps, this is where you'll have to do alot yourself. Get started by looking at what other people did. CoffeeScript (http://jashkenas.github.com/coffee-script) would be my suggestion, it's both cool and quite simple.
  • Optimization: Transforms the AST you produced into a more optimized one
    • Requires knowledge in: Once again, a fair amount of understanding of the target platform. There are lots of algorithms to wrap your head around, depending on how deep in you want to go.

There's a nice tool called BNFC (Backus-Naur Form Converter) developed at my university that can give you a kickstart into the lexer and parser parts. If you'd like to get something up and working quickly I'd recommend it very much. It's a little more limited that using lexer/parser generators directly, but very productive. You'll find it here: http://www.cse.chalmers.se/research/group/Language-technology/BNFC

All of this aside, you should of course learn as many radically different programming languages as you possibly can. All of C, Erlang, Haskell, Prolog, Lisp and Javascript have contributed greatly to widening my own ideas of programming languages. Unless you know them all, pick one and start hacking ;)

And oh, my main advice would probably be to focus on writing lots of imaginary code in your own not-yet-existing language before hacking away too much with an implementation. This works like a spec; it forces you to think about what the language should do, why and how it will feel to use it.

By the way, I too finished reading Coders at Work a couple of days ago. It was a really great read, I'd recommend it to anyone!

Jakob
Just for the record (once again) there is no need for any stage of the compilation process to create an explicit tree of any sort. This is exactly the kind of answer you would expect from an undergraduate student who has taken one course on compiler writing.
anon
Just to clarify; most of this answer is in regards to implementing a language with a compiler, as opposed to the designing of a language.
Graphics Noob
@Graphics Noob Could you elaborate a little bit?
systempuntoout
My answer was my two cents on where to get started in a practical sense; what the process of a compiler (can, but doesn't have to) look like, techniques to read about and tools to use. If it sounds like beginner stuff, then good, it should! It's written for beginners.Regarding design as opposed implementation, yes, designing a language requires a different set of skills than my list. My basic suggestions still is to learn different language paradigms and trying to express ones own language with some code. Of course there's more to it and I'd love to ramble on the design subject as well :)
Jakob
@Jakob Thanks for your answer and feel free to add something about design too ;)
systempuntoout
+1  A: 

Designing a good language is as much an art as a science. I am interested in the subject myself and I am taking every course on languages at my university. However, what I have learned in school is simply a set of tools. Knowing about type systems and operational semantics would not have helped Yukihiro Matsumoto design Ruby and I suspect that he did not have much formal training in language design when he began.

I think the best way to learn about language design is to learn as many different languages and paradigms as possible and to learn them well.

Disclaimer: I have written a couple of compilers, but I am still a novice when it comes to actually designing languages. And probably when it comes to writing compilers too ;-)

Jørgen Fogh
All praise to Matz, but the "design" of Ruby consisted of only two steps: (1) rip off Smalltalk and (2) throw in a few bits from Perl. It's good to steal from the best, of course...
Norman Ramsey
@Norman: A great artist knows exactly what to steal. Ripping off Scheme and throwing in a few bits from COBOL wouldn't have worked nearly as well.
David Thornley
@David: All too true! And ripping off Smalltalk was a truly brilliant stroke, because the Smalltalk people deliberately cut themselves off from the OS. Using Smalltalk is like this: you get to live in a really great house... but it's on Mars. Having a similar house on Earth is a huge win.
Norman Ramsey
+4  A: 

Well .. i wonder why no one mentioned the Dragon Book

mr.bio
Because it is pants?
anon
@Neil What does "it is pants" mean?
Josh Stodola
@Josh UK English for "not very good"
anon
@Neil Then I would have to disagree. It's a fabulous book and teaches you everything you need to create a programming language.
Josh Stodola
@Josh, Dragon book teaches you some basics you need to implement a programming language. But designing a language is completely out of its scope. Actually, there is no decent textbook on how to create a language.
SK-logic
@SK-logic: I suspect there never will be a good textbook on how to create a new general-purpose language. There are books, I believe on how to create domain-specific languages, but that doesn't seem to be what the OP was looking for.
David Thornley
(off-topic) @Neil: So for example, if I don't like the pants you're wearing, I can just say "Neil, your pants are pants!"?
Cam
+8  A: 

A programming language is about a set of abstractions that you use to express a meaningful program. The question is: what abstractions should the language provide? I don't think it is about compiler, lexer and parser.

The beauty of a language comes form the "less is more": a relatively small set of abstraction should be able to be combined to program great libraries, framework and ultimately end-user programs. (Individual abstractions sometimes don't fit with each other, or may even conflict. You will need to decide carefully what goes in or not.)

As C.A.R. Hoare indicates in his famous paper "Hints on Programming Language Design" there are two views on language design:

  • Part of language design consists of innovation. This activity leads to new language features in isolation.
  • The most difficult part of language design lies in integration: selecting a limited set of language features and polishing them until the result is a consistent simple framework that has no more rough edges.

1. What will make your language special?

You need to have a vision about your programming language and what it should do. What are the strength and weakness of your language? In which area do you want it to shine (there should be at least one that is the main driving force of your initiative)?

Here is short list of driving forces to consider:

  • Simplicity - A language with few abstraction is easy to grasp, but may be limited, e.g. don't expect to do pattern matching in Smalltalk. On the other hand, too many abstractions kills it as well.
  • Modularity - How do you deal with modular development, name clashes, isolation of components, etc.?
  • Composability - How do the programming language favors/impede composition? E.g. pure functions can naturally be composed, while object can't. Transactions (with transactional memory) can be composed easily, while locks can't.
  • Safety - How safe is the programming language abstractions? Can you break encapsulation in some way, e.g. if you have meta-programming facilities? Can you provide safety guarantee, e.g. with a type system?
  • Expressiveness - How easy is it to use the abstraction to express solution to some problem? Does expressiveness conflict with readability?
  • etc.

2. Abstractions in your language

When you have a vision about your language, then you can start shaping the abstractions that will support it. For instance, Scala's vision was "Let's blend function and OO" so they designed abstractions such as case class. NewSpeak's vision was "Let's make modules a first-class abstraction", so they pushed the concept of nested class to the extreme.

There have been a lot of abstractions proposed to design programming language. To design a new language, you should know a lot of them and decide how your programming language will compare against others. (Read the ECOOP or OOPSLA papers of the last decade and you will get an overview :) Here are few:

  • Object
  • Class
  • Function
  • Trait
  • Type system
  • Scoping/modularity abstractions
  • Extension mechanism (e.g. open class, extension method)
  • Security mechanism (e.g. class sealing, final)
  • State manipulation abstraction (mutation, freezing, immutability, transactions)
  • Pattern matching abstractions
  • Exception handling abstractions
  • Representation independence abstractions (e.g. properties/slot)
  • Meta-programming abstractions
  • and a lot more to come ...

3. What you need to design a programming language

To create a programming language, you probably need (1) a vision and (2) a set of tools to implement and experiment with your design, e.g. parser generators (3) formal background for certain area such as type system.

But more important that everything else, I guess you need hard work and passion :)

EDIT: I've added C.A.R. Hoare quote at the beginning.

ewernli
Thanks for your detailed answer!
systempuntoout
+3  A: 

I think you should learn more mainstream programming languages before you make your own. You should try to understand code snippets written in programming language that you did not learn. (if you learned C++, you should be able to understand Java code without learning Java)

Programming language design knowledge is very important. You must know what is the point of making (and using) a programming language. (left as an exercise to the reader, hint: why we don't program Assembly, why we are not Real Programmers?)

(note: the key topics mentioned are in bold, you should Google them for tutorials)

After you gathered ideas, then learn how to parse. Regular languages and formal language theory are musts. Also, learn about lexers such as lex and also learn how to tokenize without a lexer. A tokenizer splits a code to labeled chunks from

function factorial(n) {
  if (n == 0) { return 1; }
  else { return factorial(n - 1) * n }
}

to

[FUNCTION function] [IDENT factorial] [LEFT_PAREN (] [RIGHT_PAREN )] [LEFT_BRACE {]
[IF if] [LEFT_PAREN (] [IDENT n] [EQ ==] [INT 0] ... and so on

After that, learn about context-free grammars and parser generators such as yacc and JavaCC. A parser checks if tokens are places properly according to the set of rules ("grammar") and deal with them.

For example, a while statement is defined as "a while keyword, a left paren, an expression, a right paren, a block." You must transform it into a context-free grammar.

WhileStmt := WHILE LEFT_PAREN Expression RIGHT_PAREN Block

(Expression and Block defined separately) And a parser generator transforms them into a source code that deals with the tokens.

By this time, a good exercise for you is to write a calculator program.

Beyond that, you should learn about abstract syntax tree (AST) generation and interpretation of ASTs. In Java, the tree generation tool is called JJTree. Make a formula calculator with your knowledge.

After you mastered making interpreters, learn how to make compilers, and the fun part: bootstrapping: learn how did a Java compiler was written in Java.

I made a LOGO ripoff as an example: http://github.com/SHiNKiROU/DesignScript

Also check my own calculator: http://github.com/SHiNKiROU/ExprParser

And a simple reverse polish notation calculator (Turing-complete) that I made without any effort: http://github.com/SHiNKiROU/Qwerty-RPN

I think you don't need a computer science degree, since I'm grade 9 and I am still able to create a programming language. Google and self-study.

Sorry if my English is too weird, I said I'm grade 9 and I'm not a native English speaker.

Here are some links to some useful resources and examples:

SHiNKiROU
Thanks to have shared your experience :)
systempuntoout
@SHINKIROU: I think it would be better to study (not necessarily learn) a few non-mainstream languages in addition to the mainstream ones. Study LISP or Tcl, erlang, languages like that. Python should probably be on that list since it has a non-traditional way of detecting blocks of code.
Bryan Oakley
+5  A: 
Norman Ramsey
@Norman thanks for your answer
systempuntoout
+1  A: 

A firm understanding of denotational semantics.

leppie
+1  A: 

One skill that might help is being a Scandinavian ;)

Artium
I've read that beards are at least as important.
David Thornley
+1  A: 

I've had occasion to create perhaps as many as eight or ten little languages in my own professional career (in addition to various others for my own purposes/enjoyment). It's sometimes the best way to solve some domain-specific problem. Reference, for example.

It's not particularly miraculous or difficult. In general, the you'll do it because there's no existing language that exactly fits the bill; without that motivation, you can't really expect to be able to design some awesome language, essentially in a vacuum.

So the next time you need one, write it. Nurture it, and let it grow. (Both the language, and your beard.)

Grumdrig
Thanks for book recommendation :)
systempuntoout
I finally shaved my beard two years ago, it made me look really old...
Stephane Rolland
+2  A: 

The site I think you absolutely must go to is Ltu - Lambda The Ultimate

It will helps you confronting yourself to several other paradigms. And reading other language inventors.

Go ahead !

Stephane Rolland
+1  A: 

To actually invent new programming language the most important knowledge is about the field where that language will provide more natural and/or better expression of problems and solutions.

Most of the people you listed, and many others credited with authoring a programming language that became popular, had very limited knowledge about actual parsers, let alone compiler writing techniques - as evidenced by a lot of awkward syntax which happens when you know what you want but noit how to do it well - so you do it the way you know at that moment.

What they did know however was "their" field - what was itching them, things they needed to exprees in order to feel that it's more natural or easier to use. Thye mostly started just by ripping off whatever they found that was close enoguh to their ideas and wasy enoguh to get into and tweak. Everything else came latter.

You can invent something that is pretty much a new programming language without even writing any parser or compiler - take JQuery as example. The reason why you see so many functional-something languages is that they have virtually no parsing needs that are not already provided. You could literally write your own sub-language in Haskell without ever knowing how real parser works.

Bjarne Stroustrup has been quoted saying that he wish he used recursive descent parser for C++ - which is the lamest parsing technique in the universe. Why? Because it would make his life easier and allowed him to spend most of the time on what he really wanted to do - make a new language :-)

ZXX
@ZXX interesting answer, thanks.
systempuntoout