views:

1131

answers:

9

I just don't know that, is there any technical reason for that? Is it more difficult to implement a compiler for a language with weak typing? What is it?

+4  A: 

I'm guessing that languages with dynamic (duck) typing employ lazy evaluation, which is favored by lazy programmers, and lazy programmers don't like to write compilers ;-)

Steven A. Lowe
drive-by downvoters have no sense of humor. fortunately, i do.
Steven A. Lowe
When you've got 10k (or nearly that, in my case) reputation, it's not too painful to sacrifice one or two for a joke, is it?
Paul Tomblin
A programmers best attribute is that he/she is lazy!
Malfist
Very good one! :) +1
Eduardo León
I appreciate the joke, but how do you handle that this gets higher than a real answer? Maybe as I do: keep reading all non-negative reputation answers :)
David Rodríguez - dribeas
I handle it by laughing. One funny answer - with a grain of truth to it! - does not prevent non-funny answers from appearing and/or being voted up.
Steven A. Lowe
Heh, it's a nice play on words, but what's the grain of truth? Most languages with dynamic (duck) typing *don't* do lazy evaluation (by default), and lazy evaluation is most famous in Haskell which has a static type system and an excellent compiler. :)
ShreevatsaR
@ShreevatasaR: i'm confused, i don't think it is possible to do duck typing without lazy evaluation (late binding/dynamic binding)...
Steven A. Lowe
Duck typing *is* associated with late binding, but lazy evaluation is something quite different: http://en.wikipedia.org/wiki/Lazy_evaluation http://is.gd/bZbT etc, think Unix pipes: http://is.gd/dAtP (and of course, the awesome "Why FP" paper http://is.gd/dAtY which is really about lazy evaluation)
ShreevatsaR
@ShreevatasaR: not to argue about a joke post, but "call-by-name and lazy evaluation are each a form of late binding" -- http://academic.udayton.edu/SaverioPerugini/courses/cps343/lecture_notes/lazyevaluation.html ;-)
Steven A. Lowe
That's interesting, thanks. :) Hadn't thought of lazy evaluation (or call-by-name) as a form of "late binding". (But in any case the converse is not true; not all late binding is lazy evaluation, and most dynamically typed languages do not have lazy evaluation...) It's all just words, in the end. :)
ShreevatsaR
A: 

A guess:

In a compiled language, one system (the compiler) gets to see all the code required to do strong typing. Interpreters generally only see a tiny bit of the program at a time, and so can't do that sort of cross checking.

But this isn't a hard and fast rule - it would be quite possible to make a strongly typed interpreted language, but that would go against the sort of "loose" general feel of interpreted languages.

Paul Tomblin
The question is why would you do that? So that people writing C++ dont have to understand less about what their code actually dose to the computer and write a bunch of inefficient code?
nlaq
Because some programmers like the freedom of late binding. Others like the comfort of the compiler enforcing some discipline. I like both in their place.
Paul Tomblin
Even in loosely types languages it is commonly regarded as "bad practice" to use the same var for multiple types at runtime - or write a function/method that returns variables of different types. So why is it a good thing?
nlaq
One thing I like about perl is that I can parse a string from a file, and if it looks like a number, use it as a number without conversion. Much quicker to write than Integer.parseInt/catch NumberFormatException.
Paul Tomblin
All great until you get a badly formed file and your program happily uses zero everywhere.
Zan Lynx
@Zan, That's why I use perl for some things, and Java for others.
Paul Tomblin
+1  A: 

Because compiled languages need to take the amount of memory used in account when they are compiled.

When you see something like:

int a

in C++, the compiler spits out code that reserves four bytes of memory and then assigns the local symbol "a" to point to that memory. If you had a typeless scripting language like javascript, the interpreter, behind the scenes, allocates the memory required. You can do:

var a = 10;  // a is probably a four byte int here
a = "hello world"; // now a is a 12 byte char array

There is a lot that happens between those two lines. The interpreter deletes the memory at a, allocates the new buffer for the chars, then assigns the a var to point to that new memory. In a strongly typed language, there is no interpreter that manages that for you and thus the compiler must write instructions that take into account type.

int a = 10; // we now have four bytes on the stack.
a = "hello world"; // wtf? we cant push 12 bytes into a four byte variable! Throw an error!

So the compilers stops that code from compiling so the CPU dosn't blindly write 12 bytes into a four byte buffer and cause misery.

The added overhead for a compiler writing extra instructions to take care of type would slow down the language significantly and remove the benefit of languages like C++.

:)

-nelson

EDIT in response to comment

I don't know much about Python, so I can't say much about that. But loosely typedness slows down runtime considerably. Each instruction that the interpreter (VM) calls has to evaulate, and if necessary, coerce the var into the expected type. If you have:

mov a, 10
mov b, "34"
div a, b

Then the interpreter has to make sure that a is a variable and a number, then it would have to coerce b into a number before processing the instruction. Add that overhead for every instruction that the VM executes and you have a mess on your hands :)

nlaq
There's no reason why that sort of memory management couldn't be done in an VM like Java or .NET, but it's not because there are other advantages of strong typing.
Paul Tomblin
They don't because of performance. A better question is "why would you want a loosely typed language?" Call me old fashion or an elitist but I don't feel comfortable that my variables' types are not enforced by the interpreter.
nlaq
Oh, and the CLR isn't really a VM because it JITs the IL into machine language before it runs.
nlaq
I've written a lot of perl, and a lot of C/C++/Java. They both have their uses. I wouldn't want to write a payroll system in a loosely typed language, but I also don't want to have to define interfaces and classes for my file parser.
Paul Tomblin
Could you please clarify that: what means 'slow down the language'? Is it slowing down at compiling time or at runtime? And, as far as I know, Python has a compiler. What's wrong with it then?
snitko
+7  A: 

The reason that you do early binding (strong typing) is performance. With early binding, you find the location of the method at compile time, so that at run time it already knows where it lives.

However, with late binding, you have to go searching for a method that seems like the method that the client code called. And of course, withe many, many method calls in a program, that's what makes dynamic languages 'slow'.

But sure, you could create a statically compiled language that does late binding, which would negate many of the advantages of static compilation.

Travis
So what you're saying, compiled language with late binding will make the program slow at a runtime? Now would it be slower than interpreter then?
snitko
Travis is oversimplifying. C++ virtual member functions have late binding but a compiler finds lots of useful work to do. The Self language has *really* late binding, but thanks to Urs Hoelzle's brilliant dynamic compiler, it gets great performance. No wonder Urs is now a big shot at Google :-)
Norman Ramsey
Yer, it was a bit of an over simplification, and yes, there's lots you can do to speed up late bound method, e.g. lookup caching etc (hence the single quotes around 'slow').
Travis
Early/late binding is unrelevant to strong/weak typing.
Wei Hu
+1  A: 

Languages with weak typing can be compiled, for example, Perl5 and most versions of Lisp are compiled languages. However, the performance benefits of compiling are often lost because much of the work that the language runtime has to perform is to do with determining what type a dynamic variable really has at a particular time.

Take for example the following code in Perl:

$x=1;
$x="hello";
print $x;

It is obviously pretty difficult for the compiler to determine what type $x really has at a given point in time. At the time of the print statement, work needs to be done to figure that out. In a statically typed language, the type is fully known so performance at runtime can be increased.

1800 INFORMATION
Compilers are perfectly capable of providing code to deal with whatever types show up at execution time. Visual Basic 6 is a very popular compilable language that requires no type declarations - but lets you use them if you like.
le dorfier
VB6 variables that aren't specified are implicitly typed as variants. These are roughly analogous to a scalar type in Perl and needs to have the same kind of behind the scenes carry on that all the other dynamically typed languages have
1800 INFORMATION
+2  A: 

It's pretty much because people who write and use interpreted languages tend to prefer ducktyping, and people who develop and use compiled languages prefer strong explicit typing. (I think the concensus reason for this would be somewhere in the area of 90% for error prevention, and 10% for performance.) For most programs written today, the speed difference would be insignificant. Microsoft Word has run on p-code (uncompiled) for - what - 15 years now?

The best case in point I can think of. Classical Visual Basic (VB6/VBA/etc.) The same program could be written in VB and run with identical results and comparable speed either compiled or interpreted. FUrthermore, you have the option of type declarations (in fact variable declarations) or not. Most people preferred type declarations, usually for error prevention. I've never heard or read anywhere to use type declarations for speed. And this goes back at least as far as two powers of magnitude in hardware speed and capacity.

Google is getting a lot of attention lately because of their work on a JIT compiler for javascript - which will require no changes to the language, or require any extra consideration on the part of the programmer. In this case, the only benefit will be speed.

le dorfier
+1  A: 

There are basically two reasons to use static typing over duck typing:

  1. Static error checking.
  2. Performance

If you have an interpreted language, then there's no compile time for static error checking to take place. There goes one advantage. Furthermore, if you already have the overhead of the interpreter, then the language is already not going to be used for anything performance critical, so the performance argument becomes irrelevant. This explains why statically typed interpreted languages are rare.

Going the other way, duck typing can be emulated to a large degree in statically typed languages, without totally giving up the benefits of static typing. This can be done via any of the following:

  1. Templates. In this case, if the type you instantiate your template with supports all the methods called from within the template, your code compiles and works. Otherwise it gives a compile time error. This is sort of like compile-time duck typing.
  2. Reflection. You try to invoke a method by name, and it either works or throws an exception.
  3. Tagged unions. These are basically container classes for other types that contain some memory space and a field describing the type currently contained. These are used for things like algebraic types. When a method is invoked, it either works or throws, depending on whether the type currently contained supports it.

This explains why there are few dynamically typed, compiled languages.

dsimcha
A: 

Some languages are meant to run perfect in non-exceptional conditions and that's sacrificed for horrible performance they run into during exceptional conditions, hence very strong typing. Others were just meant to balance it with additional processing.

At times, there's way more in play than just typing. Take ActionScript for instance. 3.0 introduced stronger typing, but then again ECMAScript enables you to modify classes as you see fit at runtime and ActionScript has support for dynamic classes. Very neat, but the fact they're stating that dynamic classes should not be used in "standard" builds means it's a no-no for when you need to play it safe.

kRON
+62  A: 

The premises behind the question are a bit dodgy. It is not true that interpreted languages are mostly ducktyped. It is not true that compiled languages mostly have strong typing. The type system is a property of a language. Compiled versus interpreted is a property of an implementation.

Examples:

  • The programming language Scheme is dynamically typed (aka duck-typed), and it has many dozens of interpreted implementations, but also some fine native-code compilers including Larceny, Gambit, and PLT Scheme (which includes both an interpreter and a JIT compiler making seamless transitions).

  • The programming language Haskell is statically typed; the two most famous implementations are the interpreter HUGS and the compiler GHC. There are several other honorable implementations split about evenly between compiling to native code (yhc) and interpretation (Helium).

  • The programming language Standard ML is statically typed, and it has had many native-code compilers, of which one of the best and most actively maintained is MLton, but one of the most useful implementations is the interpreter Moscow ML

  • The programming language Objective Caml is statically typed. It comes with only one implementation (from INRIA in France) but this implementation includes both an interpreter and a native-code compiler.

  • The programming language Pascal is statically typed, but it became popular in the 1970s because of the excellent implementation built at UCSD, which was based on a P-code interpreter. In later years fine native-code compilers became available, such as the IBM Pascal/VS compiler for the 370 series of computers.

  • The programming language C is statically typed, and today almost all implementations are compiled, but in the 1980s those of us lucky enough to be using Saber C were using an interpreter.

Nevertheless there is some truth behind your question, so you deserve a more thoughtful answer. The truth is that dynamically typed languages do seem to be correlated with interpreted implementations. Why might that be?

  • Many new languages are defined by an implementation. It is easier to build an interpreter than to build a compiler. It is easier to check types dynamically than to check them statically. And if you are writing an interpreter, there is little performance benefit to static type-checking.

  • Unless you are creating or adapting a very flexible polymorphic type system, a static type system is likely to get in the programmer's way. But if you are writing an interpreter, one reason may be to create a small, lightweight implementation that stays out of the programmer's way.

  • In some interpreted languages, many fundamental operations are so expensive that the additional overhead of checking types at run time doesn't matter. A good example is PostScript: if you're going to run off and rasterize Bezier curves at the drop of a hat, you won't balk at checking a type tag here or there.

Incidentally, please be wary of the terms "strong" and "weak" typing, because they don't have a universally agreed on technical meaning. By contrast, static typing means that programs are checked before being executed, and a program might be rejected before it starts. Dynamic typing means that the types of values are checked during execution, and a poorly typed operation might cause the program to halt or otherwise signal an error at run time. A primary reason for static typing is to rule out programs that might have such "dynamic type errors". (This is another reason people who write interpreters are often less interested in static typing; execution happens immediately after type checking, so the distinction and the nature of the guarantee aren't as obvious.)

Strong typing generally means that there are no loopholes in the type system, whereas weak typing means the type system can be subverted (invalidating any guarantees). The terms are often used incorrectly to mean static and dynamic typing. To see the difference, think of C: the language is type-checked at compile time (static typing), but there are plenty of loopholes; you can pretty much cast a value of any type to another type of the same size---in particular, you can cast pointer types freely. Pascal was a language that was intended to be strongly typed but famously had an unforeseen loophole: a variant record with no tag.

Implementations of strongly typed languages often acquire loopholes over time, usually so that part of the run-time system can be implemented in the high-level language. For example, Objective Caml has a function called Obj.magic which has the run-time effect of simply returning its argument, but at compile time it converts a value of any type to one of any other type. My favorite example is Modula-3, whose designers called their type-casting construct LOOPHOLE.

In summary:

  • Static vs dynamic is the language.

  • Compiled vs interpreted is the implementation.

  • In principle the two choices can be and are made orthogonally, but for sound technical reasons dynamic typing frequently correlates with interpretation.

Norman Ramsey
Wow--that's a spectacular answer. Wish I could give you +100! Cheers.
Drew Hall
Very nice answer!
Adam Franco
Very informative, thank you. But there's still one important question left: while I understood the pitfalls of implementing interpreter for strong typed languages, you said nothing about compiling dynamic typed languages.
snitko
I'm not an expert. There's a huge body of literature on compiling Scheme and Lisp to native machine code. Post a question and I'll try to answer :-)
Norman Ramsey
Excellent write-up.
Brian Rasmussen
There is a follow-up over here: http://stackoverflow.com/questions/430182/is-c-strongly-typed#430414
troelskn