



I started looking at the upcoming C++0x specification today, and I started thinking on what is the right size for a language vocabulary. Initially, it struck me as very annoying that new keywords were introduced. At some rough level, the number of keywords in a language is, I think, an estimate of its complexity. That is, in the case of C++, adding new constructs would make it even harder to master the language. Thats one reason why the K&R book is so much smaller that the C++ equivalent.

After that, I thought about natural languages, whose vocabulary has been shown to grow linearly with time, regardless of the language (*). The only exception is, of course, Newspeak, which says a lot. The vocabulary size in this case is related to the expressive power the language.

In programming languages, however, you can have very expressive languages with a small vocabulary size (ie, Lisp).

So, to phrase this is a question, what, in your opinion, should a language vocabulary be - big and verbose or small and concise?

I'm not convinced there is a real answer here. Smaller is my preference but I can't quantify what small really is. I'd prefer to see a lean set of operators with no redundancy in them. This like is and as in C# annoy me. They are too close in functionality. If and Unless in many languages are the same way. One can easily be constructed from the other.

I'm a big fan of Lisp which is quite minimal, but even there syntactic sugar exists (like ' instead of quote).

LISP minimal? It defines almost 1000 forms/functions! R5RS Scheme has about 250 in contrast.
Lisp has 18 reserved words.
It does however does not make it useful yet. If it's about reserved keywords, Scheme has none :)
`return undef unless $var;` is better than `if( ! $var ){ return undef; }`
@Brad, I'm not convinced. It's a little more readable, but if it makes the language more complex, is that worth the tradeoff? It's no more expressive.
How big should a language vocabulary be?


So THAT was the question!
I like small languages with clear ways to extend it.

Well, let's compare two languages - C++ and Smalltalk

  • C++ - large number of reserved words, complex syntax, huge standards document

  • Smalltalk - almost no reserved words, incredibly simple syntax, tiny standards document

Now look at the relative sucess of those languages. I think the conclusion is obvious - big is better.


I would say as few as possible while maximizing functionality. Exactly where to draw the line betwen consice and complexity is very subjective.


As big as necessary, but no bigger?

Anyway, your question misses an obvious point. Languages can be hideously complex without using a lot of keywords. As an example, look at the statickeyword in C++. It has what, 3, 4 different meanings? Does that make the language less complex than if they'd used 3 or 4 different keywords?

Of course you're right. That's why I said it's a rough equivalence, at best. There's also the issue of counting the standard library or just the core language, and many other issues...
Most languages overload meanings - the '*' symbol in C has at least three, for example.