views:

139

answers:

4

Hello,

This is a brain-dead newbie question, but here goes:

What determines what files get included in a C/C++ project?

My understanding is that the compiler starts with the file that has main() in it and that file will contain #include's to get various h files which contain #include's to other h files and so on until everything is included in the project.

My questions:

What is the relationship between h files and cpp files of the same name? I mean, of course I understand that code-wise they need each other and the cpp file always (almost always?) #include's the h file, but from the compiler's point of view is it important for them to have the same names or is this all just a convention? Can I include extra cpp files without corresponding h files?

Also, when the project is built and linked, how does it know which cpp/h files to build object files for? Will it just start at the cpp file with "main()" in it and keep going through #include's until it has everything it needs and build all of that, or does it just build everything that a user specifies in the makefile or in the IDE project file?

Finally, when the linker finally comes around and links all the object code to make an executable, is there a special order it arranges everything in?

Any help, hints, explanations appreciated.. Thanks!

--R

A: 

A little hunting on the web will turn out a lot of your answer. Here's just two: http://www.psgd.org/paul/docs/cstyle/cstyle02.htm

http://www.cs.utexas.edu/~lavender/courses/EE360C/lectures/lecture-02.pdf

The second one is pretty good.

I'd also recommend the c++ Programming Language 3rd edition. There's a great section about file organization.

As for what the compiler does, that too is best explained in a separate article. In short, each cpp file is compiled into a translation unit (object code), then the linker connects everything together into the final executable.

JoshD
+1  A: 

Think of files as just an easy way to split up your code to make it both more reusable and more maintainable.

You can just as easily put an entire application in one big honking source file but you may find that the file will get rather big, leading to the compiler complaining about it (or at least taking a long time to compile it).

Typically you would hive off a part of your application (such as a generic database access layer) into a separate source file such as db.cpp and create a db.h file with its API. This file isn't so much used by db.cpp as it is used by all the other files that need to call the functions in db.cpp. It can be included in db.cpp but it tends to be mostly published information about the db code.

As to how an environment figures out which things to compile/link: you tend to have a project of some sort (makefile, IDE project file, etc) which lists all the programs that you want to compile (usually not header files).

The environment will compile each source file that it has been told about to produce an object file - part of this process is incorporating the included header files into each source file, to make a compilation or translation unit - this unit is basically the source file with the included header files incorporated at the point where the #include was.

The environment will then link all the object files to form an executable. Keep in mind there are variations on this process such as late (dynamic) linking. See here for a description of this.

paxdiablo
A: 

A header file is essentially a forward declaration of a class and all its member attributes and functions, this would basically make your class more reusable and more accessible. Think of it as an interface free from implementation therefore whoever is using it need not worry about the source for that particular class. From what I know corresponding h files and cpp files need to be of the same name. A cpp file need not always have a corresponding h file, you can have all your source in one cpp file without any h files and as long as everything is implemented properly and prototyped properly everything should work fine.

Vapen
A: 

Your analysis is basically correct... all the included files are expanded in place and the resultant code - a translation unit - is compiled into an object, library or application.

Still, any non-trivial projects relies on symbols (variables, functions) defined in other libraries, even if only for things like malloc(), socket(), file(), write() etc. provided by the language's or operating system's standard libraries. Even if you don't call them directly, they're needed by the implementation of things like new and iostream.

When your own project gets bigger, you'll also want to divide your functionality into different objects or libraries, as that makes the functionality more re-usable, independently testable, and means that after some code change you can recompile only those objects invalidated by the change then relink - which can be massively faster than recompiling every single bit of code in your entire project.

Your C++ compiler creates objects (which may or may not have the extra interfacing and code to make them libraries or applications) from translation units - which are the concatenations of includes and cpp file you've mentioned - possibly importing and combining that with symbols from existing static libraries or other objects you've mentioned on the compiler command line.

For each of these independent objects, the compiler needs to be able to tell new code how to access and use the contained symbols; the header files serve this purpose, advertising the available object content.

Implementation (cpp) files should almost always include their header file first because the compiler will then complain if there is some discrepency between the object content it is building and the header-file-advertised content that code using the object will later expect. For some things - like classes, a class declaration must be seen before the member function implementation can be specified, and given the class declaration is needed by client code and therefore in the header, in practice the implementation needs to include the header too. (I say a cpp should include its header first because the compiler will then complain if the header relies on some content that it doesn't include itself. Otherwise, if say the cpp includes the std::string header and the header uses it, but some other client code tries to include the header without having included string, then the compilation will fail).

Implementation files could include other implementation files, but that wouldn't fit in with the general division of compilation described above, so would confuse people used to this convention.

Tony