views:

687

answers:

13

In order to write better code, is it worth to know deeply what the compiler does?

Just how much would be enough? I'm not a bit scrubber, but I was thinking that knowing how the compiler operates would make me a better programmer. Am I wrong?

If so, what resources would you recommend?

+2  A: 

I think every programmer should have a basic understanding of how a compiler turns your high-level code into machine instructions, optimizations that can be performed, how memory works, and how your code executes on the hardware. I think it helps to be aware of this so you understand the performance of your program better and it may help you make better implementation choices.

It's probably not critical that you be able to actually write machine code or know exactly what virtual memory architecture your system uses, but a basic idea of these concepts I think is important.

EDIT

For example: C compilers store data in arrays in row major format so you should iterate over multidimensional arrays varying the highest dimension (right-most index) first and then proceeding to the lowest dimension (left-most index). Fortran does exactly the opposite, storing arrays in column major format. This means that in Fortran you should vary the lowest dimension first, proceeding to the highest dimension. This will improve the cache hit ratio of your code and significantly improve performance for large multidimensional arrays.

tvanfosson
Nowadays there is no performance differences between both iterations. Not sure why or how it happened, but if you actually try it out either iteration will take the same time in reality. So the array example actually is actually not valid anymore.
Robert Gould
Probably fixed by the optimizer at the AST rewriting level...just guessing.
dmckee
This is a problem that can be fixed by an advanced compiler, but not all compilers do this. All the more reason to know how the compiler works and what it can do.
Jay Conrod
@Robert -- I'm old, what can I say.
tvanfosson
I remember back when loop unrolling was generally a good thing for performance, back before we had caches.
David Thornley
+18  A: 

Probably wouldn't hurt to know how the compiler will optimize your code, but don't write for the compiler, but write for people to read.

Writing code in a way that is more optimized for the compiler may make it more difficult for people to read it, and these days the compiler probably knows better to optimize the code for you.

coobird
Some languages (Javascript, PHP) don't give you much in the way of optimizations, so it's necessary to find a compromise between readable and efficient.
too much php
+10  A: 

Without any proof of effectiveness at all, I feel better about understanding what happens to my code for knowing just a little about compilers and a bit of assembly. You can learn a lot by reading the Jack Crenshaw's Let's Build a Compiler.

Then you might look into more sophisticated compiler methods if you find yourself interested.


Edit: It is also worth noting that a lot of problems that don't call for a "compiler" are still best served by compiler methods. Parsing any modestly complicated command language is a compiler problem, even if you are not writing an executable.


Edit2: Many of the usual texts take a fairly abstract, mathematical approach to the compiler problem, which can be intimidating or confusing at first. The Crenshaw tutorial takes a "start banging out code" approach that is informed by the author's more subtle understanding. Nice intro, but if you are serious your should follow up with a more formal study.

dmckee
Thanks! I´m looking into Creenshaw´s tutorial and really is a great starting point!
Decio Lira
+1  A: 

I think what the compiler does is the important thing here ( it creates an explication with x,y,z characteristics ) That is translated into know the platform you're targeting.

The way it does that task are irrelevant ( unless you're writing compilers of course )

The most important thing we should know about a compiler are the errors messages it display.

:) Seems obvious but I'm astonished by the number of developer I've met that didn't even look at the compiler output.

OscarRyz
I disagree about not needing to know what the compiler does. 999 times out of 1000 you're right. But that last one is a real bear to figure out if you don't know something about what is happening behind the scenes.
dmckee
heheh ... and there's always THAT ONEEE time...
OscarRyz
+1  A: 

Don't learn compilers, learn the problems solved by them.

Bryan Watts
A: 

I think that what is really really important is to make an interpreter: it gives you more insight of programming languages, and that's what you use... In scheme it is actually rnot hard at to make an interpeter! But actually I would greatly encourage reading parts of SICP for great enlightment).

Concerning compilers, it's more complex as the focus here is getting some performance/doing it for an actual machine. As a programmer, what is important there is to know at least what tasks they globally perform and when do they run rather than the details, because nowadays they have grown into really complex systems especially with JIT's etc...

Piotr Lesnicki
A: 

At a minimum, you should be familiar with the language features at an abstract level. If you don't know whether or not variable names are case-sensitive, or how numbers are converted to boolean, then you probably can't even write a simple 'if' clause reliably.

Mostly, I've found that any other knowledge about the inner workings of a compiler just helps me to write more efficient code.

too much php
+1  A: 

I don't think it's as necessary to know how a compiler works as it is to continually improving ones knowledge about programming. Now, it just so happens that learning to write a compiler (or the principles behind it) happen to be a great way to expand one's knowledge.

If you interested, I would recommend getting the Dragon Book, also known as Compilers: Principles, Techniques and Tools. It may be a bit heavy going the first time, but it will certainly make you think. If you don't make it all the way through or get stuck on some parts, I would suggest shelving it for a bit and returning later - it's much easier to get through the second time around.

Travis
+3  A: 

I have taught both programming languages and advanced compilers. Here are what I think are the two most useful reasons to know what the compiler does:

  1. If you don't have any idea what the compiler is doing, you may inadvertently write code that is much more expensive than you intended. This is especially true if you are allocating memory without knowing it. A classic example is to concatenate strings in a loop, e.g., as in

    answer = ""

    for i = 1 to n do

      answer = answer .. strings[i]    -- .. is string concatenation
    

    This code is quadratic, doing a quadratic amount of allocation and copying. Bad news.

  2. The other big reason to know something about compilers is that oftentimes a problem requires a little language. If you know something about compilers (interpreters are just as good here, probably better) then you can build a little language. If you have a choice about what the language looks like, it is often better to let somebody else build the language for you. Lua is a language that is particularly good at being used as a component by other programs.

Crenshaw's tutorial isn't bad. Another nice book if you can lay your hands on it is P. J. Brown's book on interactive compilers and interpreters. It is long out of print, but you might find it in a library.

I would avoid the many fat undergraduate textbooks on compilers. A fat undergrad text that may be more worthwhile for the compiler-curious is Michael Scott's Programming-Language Pragmatics.

Norman Ramsey
+1  A: 
MadKeithV
+2  A: 

I think it will certainly make you a better programmer, in a subtle way.

A general understanding of how it works will help you be more aware of the code you are writing. I've seen lots of experienced developers that struggle to understand some fundamental concepts when learning a new language. If you know approximately how a compiler works and (maybe more important) how the code is executed you will understand better these concepts. I'm talking about heap vs stack, pointers, etc.

It may also come in handy if you need to write code to analyze or translate some text. I once wrote a program to translate some sql conditions to another custom format and writing a small parser for it was the simplest and most elegant way to do it (or so I think :) )

Also, a deep understanding of a compiler may help you to optimize specifically for it, but that can be really hard and not always recommended, as coobird said.

Rafa G. Argente
+1  A: 

Do you have any interest in effectively using a debugger? Then yes. Do you have any interest in writing reliable or efficient code? Then yes.

Personally I care about the backend more than the frontend. I recommend compiling for ARM instead of x86, in this case you are not learning assembler necessarily (where I recommend writing your own disassembler), so if you use gcc it has a disassembler and you can see what both your high level code does to change the end result, as well as how much change you can make with compiler options. It is an eye opening experience for most high level language programmers to realize that the same code can have wildly different results based on the compiler and command line options used.

For the middle of the compiler I recommend both lcc and sdcc. You may or may not want to buy the lcc book:

[http://www.cs.princeton.edu/software/lcc/][1]

You dont need to though, the source is on the net (in many forms). As is sdcc (small device c compiler, created originally for the 8051 and other 8 bit micros). My recommendation there is to go into the interface where the compiler meets the backend, you will find that your code has been turned into a series of atomic parts, sometimes reverse polish like. a = b + 7; might end up being load the constant integer 7. read from memory the variable b into the next available registers. Add 7 plus the register with b and save in the next available register. store the value in register to the location in memory for a.

You can do this with gcc as well, but you may end up realizing that gcc isnt as great as you though it was. Because of the number of languages and number of backends and number of hands in the mix, etc, it is huge and complicated. It does get by though, and does work for the languages and platforms that have experts maintaining. What gcc may teach you that the others cannot is that, the various languages on the frontend will boil down into a common middle language that the back end turns into specific instructions for each platform.

Lastly the frontend. For the most part folks use bison/yacc, which is a tool that you create a description for your high level language and the tool can parse the user input based on your description and turn it into this middle language if you will.

If you plan on your hobby or career having to do with writing software I would say you must go through this exercise once if not many times. The overall quality of your code, the reliability of your code, the performance of your code, and the efficiency writing code will be affected by this knowledge.

I would be careful with the statement "don't write for the compiler, but write for people to read." There is a lot of bad code out there because that kind of statement is misused. Writing code for maintainability results in bad code that has to be maintained. Maintainability is mutually exclusive with reliability and performance. I would personally have reliability and performance than bad code that any college grad can maintain.

You will learn, over time, not to try too hard to write for the compiler. Just dont be wasteful with your code, dont use gee whiz features of the language. If you had to do extra research in order to figure out some compiler feature, you can be sure that most of the world doesnt understand it including the person who is supposed to implement it in the compilers. Therefore you can expect that feature to not work consistently across compilers, therefore you should use it in the first place. This also means dont try to write your code for one specific compiler, dont get too attached to gcc and its features, try sdcc and lcc and microsoft and borland and kiel and others. Make your code clean, simple, readable, and portable.

Bottom line, if you are serious about writing software, then you absolutely need to know how the compiler works. gcc, sdcc, lcc, (and vbcc if you can find it) are all free, open source, and provide a learning experience that will improve your coding skills.

dwelch
+1  A: 

In a blog post, Steve Yegge asserted that all programmers should know how compilers work. He goes so far as to say:

Gentle, yet insistent executive summary: If you don't know how compilers work, then you don't know how computers work. If you're not 100% sure whether you know how compilers work, then you don't know how they work.

In the article, he makes a compeling argument for needing to know compilers. He also provides and list of real world examples where knowing how to parse and analyze would be useful.

epotter
Nice. I read that blog post, and it's a great one! What resources, did you use learn more about compilers? (aside from college classes)
Decio Lira
Honestly, I hadn't done much with compilers since collage. Yegge's post motivated me to get back into it. I've read about F# being a good parser. So I figure I'll read Let's Build a Compiler, by Jack Crenshaw (http://compilers.iecc.com/crenshaw/) and then try to write a DSL compiler in F#.
epotter