views:

332

answers:

6

Possible Duplicate:
Should C++ eliminate header files?

In languages like C# and Java there is no need to declare (for example) a class before using it. If I understand it correctly this is because the compiler does two passes on the code. In the first it just "collects the information available" and in the second one it checks that the code is correct.

In C and C++ the compiler does only one pass so everything needs to be available at that time.

So my question basically is why isn't it done this way in C and C++. Wouldn't it eliminate the needs for header files?

+4  A: 

Bottom line: there have been advances in compiler technology that make forward declarations unnecessary. Plus computers are thousands of times faster, and so can make the extra calculations necessary to handle the lack of forward declarations.

C and C++ are older and were standardized at a time when it was necessary to save every CPU cycle.

Randolpho
:-) In other words - C# is better than C++.
Franci Penov
You're missing the key words here: backwards compatibility. Your last line makes it sound like C and C++ have only one version of the standard from the stone ages. It should read "and were *first* standardized...and to maintain backwards compatibility, the method remains the same." @Franci: When you're done writing an OS in C#, come get me.
GMan
@Franci: No... in other words, modern language compilers have made forward declarations obsolete because they do not have to worry about backwards compatibility. It *could* be done in C++. Have fun writing hardware drivers in C# bud.
Ed Swangren
@GMan - Save the Unicorns: You have excellent points about "first standardized" and "backwards compatibility". Regarding an OS in C#: I give you [Singularity](http://research.microsoft.com/en-us/projects/singularity/). Now, granted, some performance-critical portions of the kernel were written in C, but given that some portions of the kernel are frequently written in assembly, I'd say they've stepped up a bit.
Randolpho
@Ed: obsolete? This way you need to store the metainformation in the assembly (for .NET), something which C++ can successfully avoid.
Vlad
@Vlad: Who cares? As a programmer I don't care how it is done, I care that I don't have to write forward declarations. When this becomes a performance issue let me know.
Ed Swangren
@Randolpho: Yeah, because Singularity is very widely used, and they still had to write parts of it in C.
Ed Swangren
@Ed: A good programmer cares about quality of his output.
Vlad
@Ed Swangren: exactly! (re: performance) Performance isn't the end-all be-all of programming, it's just something that you might need in some cases. Productivity, stability, and maintainability are all infinitely more important.
Randolpho
@Gman: but can't that be implemented as just a switch in the compiler?
@Vlad: That's funny. Somehow storing metadata reduces the quality of your output? Please provide a real life example where this is an issue.
Ed Swangren
@Randolpho: "infinitely more important" is probably a *slight* exaggeration.
Matthew Crumley
@user: The compiler has nothing to do with the language. I'm sure there exist somewhere a branch of some compiler that implements modules in C++. But to make such a thing standardized is no trivial task. I think it was proposed for C++0x, but it's not accepted. Such a change requires old code to work as is while allowing new syntax to specify new behavior. There are more pressing issues to worry about.
GMan
@Ed Swangren - it is possible to write hardware drivers in C#. In fact, there are user-level drivers written in C#. As for the backward compatibility - that is somewhat valid argument, only if you insist that any C++ source should be compatible with any C++ compiler ever written. However, there's no reason why there couldn't be a C++ v2 that introduces new language extensions that are supported only by C++ v2 compilers.
Franci Penov
@Ed: the words "real life example" suggest that you consider storing metadata in the output file a neglectable issue. For some people it's an issue, of course. For example, you deploy to the user something that he strictly doesn't need to have. If you want some more "business-sounding" reasons, you expose your source code to the reverse engineering.
Vlad
@Matthew Crumley: Ok, perhaps *infinitely* was a bit exaggerated. More like several hundred thousand orders of magnitude. Better?
Randolpho
@Randolpho: The approach "don't care if it's fast and efficient, but it should be fun to code" drives me as user crazy. I've seen too many programs written with this attitude.
Vlad
@Vlad: No. Find me this user who has an issue directly caused by storing metadata in an assembly. That user does not exits, and you cannot give me a real life example, so yes, it is a non-issue. As for writing drivers in C#, it would be a huge pain to get the low level access that you need and would just be silly. Many things are theoretically possible. I could pound in a nail with a screwdriver, but why in the world would I do such a thing when hammers exist?
Ed Swangren
GMan> in fact the "modules" feature in C++ was planned but couldn't be included in the new standard as it was already a lot of work. So today it's planned to be discussed for the following TR2 or major version. Would be nice if it took less than 5 years :P
Klaim
@(another at)Vlad: Again, show me where these things are causing performance issues. Show me numbers where it matters.
Ed Swangren
@Ed: Users don't complain directly about the metadata, because they don't know what it is. However they complain about programs getting bigger and slower, about cost they need to pay on the hardware in order to be able to do simple things. Imho at least part of these problems comes from attitude "don't care about the efficiency".
Vlad
@Vlad: Did I say (write) anything about being "fun to code"? No, sir. If a program is unusable it is fail, and performance from a UI perspective certainly falls in that category. But does that mean that you absolutely have to implement Han's algorithm because the framework's array sort only offers a derivative of Heapsort? No, it doesn't. Odds are, you don't; even bubblesort might be fast enough. What is far more valuable than sofware that performs a half-second faster is software that can be written faster and (if necessary) modified faster.
Randolpho
@Ed: numbers? Very simple: how much RAM was it needed for running window manager on client's OS 10 years ago, and how much is it needed now? Compare yourself.
Vlad
@Randolpho: I disagree. Writing software faster makes the software company get invested money back fast. But good software requires time to be written. As user, I personally prefer fast, well-debugged and highly-optimized software. [And yes, I sometimes can distinguish software which uses bubblesort.]
Vlad
@Randolpho: no, you didn't say "fun to code". I just understood you that way, sorry if that was wrong.
Vlad
@Vlad: Oh, I agree that putting software out there fastest isn't the most important, but it's highly important, particularly in a world as competitive as software. But what I value even more than either productivity (speed of release) or performance (speed of execution) is maintainability (speed of modification). Software lives a long time, and everyone loves to forget that big-ass elephant-in-the-room "maintenance" phase of a software project.
Randolpho
@Vlad: I suppose that would be a valid point if modern machines had the same amount of RAM in them as they did years ago. As they have much more and the price has declined, why not use the extra bits for something productive? That does not make your code "bloated and slow", there is a difference.
Ed Swangren
BTW, I should mention that I am a systems engineer who often works in environments with limited resources. However, I am also practical. Your basic GUI does not benefit from micro-optimization.
Ed Swangren
@Randolpho: You are right--however I try to look from the point of view of user, not business. Unfortunately in our Universe the software must be competitive (this includes being promptly developed and released, as well as cheap in production), so optimization up the the last bits has died out. Definitely maintenance is a big issue, but I believe that a good design can contribute more to it than any generic framework.
Vlad
@Ed: I suspect we mix the cause and the effect now. The processor speed/memory requirements race was (partially) driven by the declining software quality--which posed actual need for more resources. The usual program nowadays runs at approximately the same speed as 10 years ago (from the user POV), and doesn't bring substantial functionality gain, so the increased power of hardware was compensating increased inaccuracy of development.
Vlad
@Vlad: Very true. But beware premature or even micro-optimization. You have to think about the return on investment; is it really worth the extra time to squeeze 30 less milliseconds out of that method just to satisfy some notion that performance is king?
Randolpho
@Randolpho: Totally agree, "premature optimization is root of all evil". Still, I miss the times when programming was not a business but an art, so writing the program as optimal as possible was a kind of "point of honour".
Vlad
@Vlad: It still is an art, you just have to shift your focus. Look at the code. If some n00b off the street comes in and reads your code, can he understand it? Then you have written a work of art.
Randolpho
+1  A: 

This is because of smaller compilation modules in C/C++. In C/C++, each .c/.cpp file is compiled separately, creating an .obj module. Thus the compiler needs the information about types and variables, declared in other compilation modules. This information is supplied in form of forward declarations, usually in header files.

C#, on the other side, compiles several .cs files into one big compilation module at once.

In fact, when referencing different compiled modules from a C# program, the compiler needs to know the declarations (type names etc.) the same way as C++ compiler does. This information is obtained from the compiled module directly. In C++ the same information is explicitly separated (that's why you cannot find out the variable names from C++-compiled DLL, but can determine it from .NET assembly).

Vlad
Unfortunately, that doesn't explain how C# manages our 100-assembly Solution containing thousands of source files and millions of lines of code better than C++ manages a single .h file (which may still need a forward declaration even though all the information it needs is in the one file).
Jason Williams
@Jason: the benefits of separate compilation are different: when you change only implementation, your recompilation will be almost instant. (Of course this make the very first compilation slower.) I don't know why your C++ cannot manage a single header file, I did never encounter any problems with mine.
Vlad
@Vlad: You said "...compiler needs information about types ... declared in other compilation modules", but in fact, even a single C++ class in a single header *can* require predeclaration - i.e even within one compilation module. That is, C++ parses in a linear way through the code (thus requiring predeclaration of future types when they are referenced), while C# effectively builds a database for the codebase that allows it to have random access to all the types.
Jason Williams
@Jason: you are right. This is sometimes an advantage, and sometimes not. For example, C# needs to recompile all the .cs files as soon as any of them changes, because it cannot see in advance which module requires which information. With C++, if no header is changed, only recompilation of the changed .cpp files is needed. From the other side, C++ requires the developer to understand which headers he needs to include.
Vlad
@Jason: predeclaration is needed sometimes, because C++ reads the input in 1 pass. So in order to reference something declared later on in the same file, you need to make a forward declaration. However I don't see this as a complicated problem for a decent developer.
Vlad
@Vlad: I agree with you - indeed, predeclaration is easy to code, and has some advantages - these are probably two of the main reasons nobody has bothered to write a 2-pass C++ compiler. (There simply isn't enough demand to justify the enormous cost)
Jason Williams
+17  A: 

The short answer is that computing power and resources advanced exponentially between the time that C was defined and the time that Java came along 25 years later.

The longer answer...

The maximum size of a compilation unit -- the block of code that a compiler processes in a single chunk -- is going to be limited by the amount of memory that the compiling computer has. In order to process the symbols that you type into machine code, the compiler needs to hold all the symbols in a lookup table and reference them as it comes across them in your code.

When C was created in 1972, computing resources were much more scarce and at a high premium -- the memory required to store a complex program's entire symbolic table at once simply wasn't available in most systems. Fixed storage was also expensive, and extremely slow, so ideas like virtual memory or storing parts of the symbolic table on disk simply wouldn't have allowed compilation in a reasonable timeframe.

The best solution to the problem was to chunk the code into smaller pieces by having a human sort out which portions of the symbol table would be needed in which compilation units ahead of time. Imposing a fairly small task on the programmer of declaring what he would use saved the tremendous effort of having the computer search the entire program for anything the programmer could use.

It also saved the compiler from having to make two passes on every source file: the first one to index all the symbols inside, and the second to parse the references and look them up. When you're dealing with magnetic tape where seek times were measured in seconds and read throughput was measured in bytes per second (not kilobytes or megabytes), that was pretty meaningful.

C++, while created almost 17 years later, was defined as a superset of C, and therefore had to use the same mechanism.

By the time Java rolled around in 1995, average computers had enough memory that holding a symbolic table, even for a complex project, was no longer a substantial burden. And Java wasn't designed to be backwards-compatible with C, so it had no need to adopt a legacy mechanism. C# was similarly unencumbered.

As a result, their designers chose to shift the burden of compartmentalizing symbolic declaration back off the programmer and put it on the computer again, since its cost in proportion to the total effort of compilation was minimal.

Dan Story
Excellent summary. Reminded me of the "good old days" compiling a C program on a 2 floppy drive 640K PC - Took about 10 minutes with a half dozen or more floppy changes. All that for a program containing no more than a couple hundred statements! And thought I was in heaven with all that power.
NealB
A: 

The forward declarations in C++ are a way to provide metadata about the other pieces of code that might be used by the currently compiled source to the compiler, so it can generate the correct code.

That metadata can come from the author of the linked library/component. However, it can also be automatically generated (for example there are tools that generate C++ header files for COM objects). In any case, the C++ way of expressing that metadata is through the header files you need to include in your source code.

The C#/.Net also consume similar metadata at compile time. However, that metadata is automatically generated when the assembly it applies to is built and is usually embedded into it. Thus, when you reference in your C# project an assembly, you are essentially telling the compiler "look for the metadata you need in this assembly as well, please".

In other words, the metadata generation and consumption in C# is more transparent to the developers, allowing them to focus on what really matters - writing their own code.

There are also other benefits to having the metadata about the code bundled with the assembly as well. Reflection, code emitting, on-the-fly serialization - they all depend on the metadata to be able to generate the proper code at run-time.

The C++ analogue to this would be RTTI, although it's not widely-adopted due ot incompatible implementations.

Franci Penov
A: 

From Eric Lippert, blogger of all things internal to C#: http://blogs.msdn.com/ericlippert/archive/2010/02/04/how-many-passes.aspx:

The C# language does not require that declarations occur before usages, which has two impacts, again, on the user and on the compiler writer. [...]

The impact on the compiler writer is that we have to have a “two pass” compiler. In the first pass, we look for declarations and ignore bodies. Once we have gleaned all the information from the declarations that we would have got from the headers in C++, we take a second pass over the code and generate the IL for the bodies.

To sum up, using something does not require declaring it in C#, whereas it does in C++. That means that in C++, you need to explicitly declare things, and it's more convenient and safe to do that with header files so you don't violate the One Definition Rule.

MSN
+2  A: 

No, it would not obviate header files. It would eliminate the requirement to use a header to declare classes/functions in the same file. The major reason for headers is not to declare things in the same file though. The primary reason for headers is to declare things that are defined in other files.

For better or worse, the rules for the semantics of C (and C++) mandate the "single pass" style behavior. Just for example, consider code like this:

int i;

int f() { 
     i = 1;
     int i = 2;
}

The i=1 assigns to the global, not the one defined inside of f(). This is because at the point of the assignment, the local definition of i hasn't been seen yet so it isn't taken into account. You could still follow these rules with a two-pass compiler, but doing so could be non-trivial. I haven't checked their specs to know with certainty, but my immediate guess would be that Java and C# differ from C and C++ in this respect.

Edit: Since a comment said my guess was incorrect, I did a bit of checking. According to the Java Language Reference, §14.4.2, Java seems to follow pretty close to the same rules as C++ (a little different, but not a whole lot.

At least as I read the C# language specification, (warning: Word file) however, it is different. It (§3.7.1) says: "The scope of a local variable declared in a local-variable-declaration (§8.5.1) is the block in which the declaration occurs."

This appears to say that in C#, the local variable should be visible throughout the entire block in which it is declared, so with code similar to the example I gave, the assignment would be to the local variable, not the global.

So, my guess was half right: Java follows (pretty much0 the same rule as C++ in this respect, but C# does not.

Jerry Coffin
Your guess would be incorrect.
Dennis Zickefoose