I am designing a programming language and one of the problems i was thinking was why do programming languages take long to compile. Assumed c++ takes a long time because it needs to parse and compile a header everytime it compiles a file. But i -heard- precompiled headers take as long? i suspect c++ is not the only language that has this problem.
Compiling is a complicated process which involves quite a few steps:
- Scanning/Lexing
- Parsing
- Intermediate code generation
- Possibly Intermediate code optimization
- Target Machine code generation
- Optionally Machine-dependent code optimization
(Leaving aside linking.)
Naturally, this will take some time for longer programs.
They take as long as they take and it usually depends on how much extraneous stuff you inject into your compilation units. I'd like to see you hand-compile them any faster :-)
The first time you compile a file, you should have no headers at all. Then add them as you need them (and check when you're finished whether you still need them).
Other ways of reducing that time is to keep your compilation units small (even to the point of one function per file, in an extreme case) and use a make-like tool to ensure you only build what's needed.
Some compilers (IDE's really) do incremental compilation in the background so that they're (almost) always close to fully-compiled.
After you finish writing the compiler for that language you're designing, you'll know exactly why.
Language design does have an effect on compiler performance. C++ compilers are typically slower than C# compilers, which has a lot to do with the design of the language. (This also depends on the compiler implementer, Anders Hejlsberg implemented C# and is one of the best around.)
The simplistic "header file" structure of C++ contributes to its slower performance, although precompiled headers can often help. C++ is a much more complex language than C, and C compilers are therefore typically faster.
One C++ specific problem that makes it horribly slow is that, unlike almost any other language, you can't parse it independently of semantic analysis.
Precompiled headers are way faster, as has been known at least since 1988.
The usual reason for a C compiler or C++ compiler to take a long time is that it has to #include, preprocess, and then lex gazillions of tokens.
As an exercise you might find out how long it takes just to run cpp over a typical collection of header files---then measure how long it takes to lex the output.
gcc -O uses a very effective but somewhat slow optimization technique developed by Chris Fraser and Jack Davidson. Most other optimizers can be slow because they involve repeated iteration over fairly large data structures.
Compilation does not need to take long: tcc compiles ANSI c fast enough to be useful as an interpreter.
Some thing to think about:
- Complexity in the scanning and parsing passes. Presumably requiring long look-aheads will hurt, as will contextual (as opposed to context-free) languages.
- Internal representation. Building and working on a large and featureful AST will take some time. Presumably you should use the simplest internal representation that will support the features you want to implement.
- Optimization. Optimization is fussy. You need to check for a lot of different conditions. You probably want to make multiple passes. All of this is going to take time.