I just don't know that, is there any technical reason for that? Is it more difficult to implement a compiler for a language with weak typing? What is it?
views:
1131answers:
9I'm guessing that languages with dynamic (duck) typing employ lazy evaluation, which is favored by lazy programmers, and lazy programmers don't like to write compilers ;-)
A guess:
In a compiled language, one system (the compiler) gets to see all the code required to do strong typing. Interpreters generally only see a tiny bit of the program at a time, and so can't do that sort of cross checking.
But this isn't a hard and fast rule - it would be quite possible to make a strongly typed interpreted language, but that would go against the sort of "loose" general feel of interpreted languages.
Because compiled languages need to take the amount of memory used in account when they are compiled.
When you see something like:
int a
in C++, the compiler spits out code that reserves four bytes of memory and then assigns the local symbol "a" to point to that memory. If you had a typeless scripting language like javascript, the interpreter, behind the scenes, allocates the memory required. You can do:
var a = 10; // a is probably a four byte int here
a = "hello world"; // now a is a 12 byte char array
There is a lot that happens between those two lines. The interpreter deletes the memory at a, allocates the new buffer for the chars, then assigns the a var to point to that new memory. In a strongly typed language, there is no interpreter that manages that for you and thus the compiler must write instructions that take into account type.
int a = 10; // we now have four bytes on the stack.
a = "hello world"; // wtf? we cant push 12 bytes into a four byte variable! Throw an error!
So the compilers stops that code from compiling so the CPU dosn't blindly write 12 bytes into a four byte buffer and cause misery.
The added overhead for a compiler writing extra instructions to take care of type would slow down the language significantly and remove the benefit of languages like C++.
:)
-nelson
EDIT in response to comment
I don't know much about Python, so I can't say much about that. But loosely typedness slows down runtime considerably. Each instruction that the interpreter (VM) calls has to evaulate, and if necessary, coerce the var into the expected type. If you have:
mov a, 10
mov b, "34"
div a, b
Then the interpreter has to make sure that a is a variable and a number, then it would have to coerce b into a number before processing the instruction. Add that overhead for every instruction that the VM executes and you have a mess on your hands :)
The reason that you do early binding (strong typing) is performance. With early binding, you find the location of the method at compile time, so that at run time it already knows where it lives.
However, with late binding, you have to go searching for a method that seems like the method that the client code called. And of course, withe many, many method calls in a program, that's what makes dynamic languages 'slow'.
But sure, you could create a statically compiled language that does late binding, which would negate many of the advantages of static compilation.
Languages with weak typing can be compiled, for example, Perl5 and most versions of Lisp are compiled languages. However, the performance benefits of compiling are often lost because much of the work that the language runtime has to perform is to do with determining what type a dynamic variable really has at a particular time.
Take for example the following code in Perl:
$x=1;
$x="hello";
print $x;
It is obviously pretty difficult for the compiler to determine what type $x really has at a given point in time. At the time of the print statement, work needs to be done to figure that out. In a statically typed language, the type is fully known so performance at runtime can be increased.
It's pretty much because people who write and use interpreted languages tend to prefer ducktyping, and people who develop and use compiled languages prefer strong explicit typing. (I think the concensus reason for this would be somewhere in the area of 90% for error prevention, and 10% for performance.) For most programs written today, the speed difference would be insignificant. Microsoft Word has run on p-code (uncompiled) for - what - 15 years now?
The best case in point I can think of. Classical Visual Basic (VB6/VBA/etc.) The same program could be written in VB and run with identical results and comparable speed either compiled or interpreted. FUrthermore, you have the option of type declarations (in fact variable declarations) or not. Most people preferred type declarations, usually for error prevention. I've never heard or read anywhere to use type declarations for speed. And this goes back at least as far as two powers of magnitude in hardware speed and capacity.
Google is getting a lot of attention lately because of their work on a JIT compiler for javascript - which will require no changes to the language, or require any extra consideration on the part of the programmer. In this case, the only benefit will be speed.
There are basically two reasons to use static typing over duck typing:
- Static error checking.
- Performance
If you have an interpreted language, then there's no compile time for static error checking to take place. There goes one advantage. Furthermore, if you already have the overhead of the interpreter, then the language is already not going to be used for anything performance critical, so the performance argument becomes irrelevant. This explains why statically typed interpreted languages are rare.
Going the other way, duck typing can be emulated to a large degree in statically typed languages, without totally giving up the benefits of static typing. This can be done via any of the following:
- Templates. In this case, if the type you instantiate your template with supports all the methods called from within the template, your code compiles and works. Otherwise it gives a compile time error. This is sort of like compile-time duck typing.
- Reflection. You try to invoke a method by name, and it either works or throws an exception.
- Tagged unions. These are basically container classes for other types that contain some memory space and a field describing the type currently contained. These are used for things like algebraic types. When a method is invoked, it either works or throws, depending on whether the type currently contained supports it.
This explains why there are few dynamically typed, compiled languages.
Some languages are meant to run perfect in non-exceptional conditions and that's sacrificed for horrible performance they run into during exceptional conditions, hence very strong typing. Others were just meant to balance it with additional processing.
At times, there's way more in play than just typing. Take ActionScript for instance. 3.0 introduced stronger typing, but then again ECMAScript enables you to modify classes as you see fit at runtime and ActionScript has support for dynamic classes. Very neat, but the fact they're stating that dynamic classes should not be used in "standard" builds means it's a no-no for when you need to play it safe.
The premises behind the question are a bit dodgy. It is not true that interpreted languages are mostly ducktyped. It is not true that compiled languages mostly have strong typing. The type system is a property of a language. Compiled versus interpreted is a property of an implementation.
Examples:
The programming language Scheme is dynamically typed (aka duck-typed), and it has many dozens of interpreted implementations, but also some fine native-code compilers including Larceny, Gambit, and PLT Scheme (which includes both an interpreter and a JIT compiler making seamless transitions).
The programming language Haskell is statically typed; the two most famous implementations are the interpreter HUGS and the compiler GHC. There are several other honorable implementations split about evenly between compiling to native code (yhc) and interpretation (Helium).
The programming language Standard ML is statically typed, and it has had many native-code compilers, of which one of the best and most actively maintained is MLton, but one of the most useful implementations is the interpreter Moscow ML
The programming language Objective Caml is statically typed. It comes with only one implementation (from INRIA in France) but this implementation includes both an interpreter and a native-code compiler.
The programming language Pascal is statically typed, but it became popular in the 1970s because of the excellent implementation built at UCSD, which was based on a P-code interpreter. In later years fine native-code compilers became available, such as the IBM Pascal/VS compiler for the 370 series of computers.
The programming language C is statically typed, and today almost all implementations are compiled, but in the 1980s those of us lucky enough to be using Saber C were using an interpreter.
Nevertheless there is some truth behind your question, so you deserve a more thoughtful answer. The truth is that dynamically typed languages do seem to be correlated with interpreted implementations. Why might that be?
Many new languages are defined by an implementation. It is easier to build an interpreter than to build a compiler. It is easier to check types dynamically than to check them statically. And if you are writing an interpreter, there is little performance benefit to static type-checking.
Unless you are creating or adapting a very flexible polymorphic type system, a static type system is likely to get in the programmer's way. But if you are writing an interpreter, one reason may be to create a small, lightweight implementation that stays out of the programmer's way.
In some interpreted languages, many fundamental operations are so expensive that the additional overhead of checking types at run time doesn't matter. A good example is PostScript: if you're going to run off and rasterize Bezier curves at the drop of a hat, you won't balk at checking a type tag here or there.
Incidentally, please be wary of the terms "strong" and "weak" typing, because they don't have a universally agreed on technical meaning. By contrast, static typing means that programs are checked before being executed, and a program might be rejected before it starts. Dynamic typing means that the types of values are checked during execution, and a poorly typed operation might cause the program to halt or otherwise signal an error at run time. A primary reason for static typing is to rule out programs that might have such "dynamic type errors". (This is another reason people who write interpreters are often less interested in static typing; execution happens immediately after type checking, so the distinction and the nature of the guarantee aren't as obvious.)
Strong typing generally means that there are no loopholes in the type system, whereas weak typing means the type system can be subverted (invalidating any guarantees). The terms are often used incorrectly to mean static and dynamic typing. To see the difference, think of C: the language is type-checked at compile time (static typing), but there are plenty of loopholes; you can pretty much cast a value of any type to another type of the same size---in particular, you can cast pointer types freely. Pascal was a language that was intended to be strongly typed but famously had an unforeseen loophole: a variant record with no tag.
Implementations of strongly typed languages often acquire loopholes over time, usually so that part of the run-time system can be implemented in the high-level language. For example, Objective Caml has a function called Obj.magic
which has the run-time effect of simply returning its argument, but at compile time it converts a value of any type to one of any other type. My favorite example is Modula-3, whose designers called their type-casting construct LOOPHOLE
.
In summary:
Static vs dynamic is the language.
Compiled vs interpreted is the implementation.
In principle the two choices can be and are made orthogonally, but for sound technical reasons dynamic typing frequently correlates with interpretation.