views:

413

answers:

6

For my programming languages class, I'm writing a research paper on some papers by some important people in the history of language design. One by CAR Hoare struck me as odd because it speaks against independent compilation techniques used in C and later C++ before C even became popular.

Since this is primarily an optimization to speed up compilation times, what is it about Java and C# that make them able to avoid reliance on independent compilation? Is it a compiler technique or are there elements of the language that facilitate this? And are there any other compiled languages that used these techniques before them?

+4  A: 

IMO, one of the biggest factors here is that both java and .NET use intermediate languages; that means that the compiled unit (jar/assembly) contains, as a pre-requisite, a lot of expressive metadata about the types, methods, etc; meaning that it is already laid out conveniently for reference checking. The runtime still checks anyway, in case you are pulling a fast one ;-p

This isn't very far removed from the MIDL that underpins COM, although there the TLB is often a separate entity.

If I've misunderstood your meaning, please let me know...

Marc Gravell
I *believe* you understood me properly. I'm basically meaning the idea in C/C++ that each source file is its own individual compilation unit. This doesn't as much seem to be the case in C# or Java.
Jason Baker
+2  A: 

It requires some language support (otherwise, C/C++ compilers would do it too)

In particular, it requires that the compiler generates self-contained modules, which expose metadata that other modules can reference to call into them.

.NET assemblies are a straightforward example. All the files in a project are compiled together, generating one dll. This dll can be queried by .NET to determine which types it contains, so that other assemblies can call functions defined in it.

And to make use of this, it must be legal in the language to reference other modules.

In C++, what defines the boundary of a module? The language specifies that the compiler only considers data in its current compilation unit (.cpp file + included headers). There is no mechanism for specifying "I'd like to call function Foo in module Bar, even though I don't have the prototype or anything for it at compile-time". The only mechanism you have for sharing type information between files is with #includes.

There is a proposal to add a module system to C++, but it won't be in C++0x. Last I saw, the plan was to consider it for a TR1 after 0x is out.

(It's worth mentioning that the #include system in C/C++ was originally used because it'd speed up compilation. Back in the 70's, it allowed the compiler to process the code in a simple linear scan. It didn't have to build syntax trees or other such "advanced" features. Today, the tables have turned and it's become a huge bottleneck, both in terms of usability and compilation speed.)

jalf
+1  A: 

The object files generated by a C/C++ are ment to be read only by the linker, not by the compiler.

Maurice Perry
+1  A: 

As to other languages: IIRC Turbo Pascal had "units" which you could use without having any source code. I think the point is to create metadata along with compiled code which can then be used by the compiler to figure out the interface to the module (i.e. signatures of functions, class layout etc.)

One problem with C/C++ which prevents just replacing #include with some kind of #import is also the preprocessor, which can completely change the meaning/syntax etc of included/imported modules. This would be very difficult (if not impossible) with a Java-like module system.

MartinStettner
+4  A: 

Short answer: Java and C# don't avoid separate compilation; they make full use of it.

Where they differ is that they don't require the programmer to write a pair of separate header/implementation files when writing a reusable library. The user writes the definition of a class once, and the compiler extracts the information equivalent to the "header" from that single definition and includes it in the output file as "type metadata". So the output file (a .jar full of .class files in Java, or an .dll assembly in .NET-based languages) is a combination of binaries AND headers in a single package.

Then when another class is compiled and it depends on the first class, it can look at the metadata instead of having to find a separate include file.

It happens that they target a virtual machine rather than a specific chip architecture, but that's a separate issue; they could put x86 machine code in as the binary and still have the header-like metadata in the same file as well (this is in fact an option in .NET, albeit rarely used).

In C++ compilers it is common to try to speed up compilation by using "pre-compiled headers". The metadata in .NET .dll and .class files is much like a pre-compiled header - already parsed and indexed, ready for rapid look-ups.

The upshot is that in these modern languages, there is one way of doing modularization, and it has the characteristics of a perfectly organised and hand-optimised C++ modular build system - pretty nifty, speaking ASFAC++B.

Daniel Earwicker
+3  A: 

You could consider a java .class file to be similar to a precompiled header file in C/C++. Essentially the .class file is the intermediate form that a C/C++ linker would need as well as all of the information contained in the header (Java just doesn't have a separate header).

Form your comment in another post:

"I'm basically meaning the idea in C/C++ that each source file is its own individual compilation unit. This doesn't as much seem to be the case in C# or Java."

In Java (I cannot speak for C#, but I assume it is the same) each source file is its own individual compilation unit. I am not sure why you would think it is not... perhaps we have different definitions of compilation unit?

TofuBeer