I'm really sorry if this sounds kinda dumb. I just finished reading K&R and I worked on some of the exercises. This summer, for my project, I'm thinking of re-implementing a linux utility to expand my understanding of C further so I downloaded the source for GNU tar and sed as they both seem interesting. However, I'm having trouble understanding where it starts, where's the main implementation, where all the weird macros came from, etc.

I have a lot of time so that's not really an issue. Am I supposed to familiarize myself with the GNU toolchain (ie. make, binutils, ..) first in order to understand the programs? Or maybe I should start with something a bit smaller (if there's such a thing) ?

I have little bit of experience with Java, C++ and python if that matters.


+3  A: 

GNU Hello is probably the smallest, simplest GNU program and is easy to understand.

There must be a joke hiding in the fact that the latest version of GNU Hello (2.4.90) is a 566 KB download, as a tar.gz archive. That's just ... scary.
@unwind GNU Hello is more than just a "Hello World" program: it prints "Hello World" in many languages, makes coffee and loans you money when you need it.
Paolo Bonzini
+1  A: 

I know sometimes it's a mess to navigate through C code, especially if you're not familiarized with it. I suggest you use a tool that will help you browse through the functions, symbols, macros, etc. Then look for the main() function.

You need to familiarize yourself with the tools, of course, but you don't need to become an expert.

+1 I'm very glad that source navigator has a new release. I used it back in 2006 and it looked like a non-active project. Anyway, it's a very good tool.
Iulian Şerbănoiu
+5  A: 

The problem with programs like tar and sed is twofold (this is just my opinion, of course!). First of all, they're both really old. That means they've had multiple people maintain them over the years, with different coding styles and different personalities. For GNU utilities, it's usually pretty good, because they usually enforce a reasonably consistent coding style, but it's still an issue. The other problem is that they're unbelievably portable. Usually "portability" is seen as a good thing, but when taken to extremes, it means your codebase ends up full of little hacks and tricks to work around obscure bugs and corner cases in particular pieces of hardware and systems. And for programs as widely ported as tar and sed, that means there's a lot of corner cases and obscure hardware/compilers/OSes to take into account.

If you want to learn C, then I would say the best place to start is not trying to study code that others have written. Rather, try to write code yourself. If you really want to start with an existing codebase, choose one that's being actively maintained where you can see the changes that other people are making as they make them, follow along in the discussions on the mailing lists and so on.

With well-established programs like tar and sed, you see the result of the discussions that would've happened, but you can't see how software design decisions and changes are being made in real-time. That can only happen with actively-maintained software.

That's just my opinion of course, and you can take it with a grain of salt if you like :)

Dean Harding
I agree that the best way to learn C is by programming.However,once you have mastered the syntax and nuances of the language,it always helps to walk through well written code, which will give you an idea of new ways in which you can practically apply the syntax/data structures of the language.
@itisravi: I still believe it's better to learn that by watching development as it happens, rather than after the fact, though. For example, if you see a piece of code and you wonder "why did they do it like that, why didn't they do it this (other) way?" If you can post a message to a mailing list and *ask* then you're going to learn a lot more than if you just accept whatever has been written.
Dean Harding
+1  A: 

Learn how to use grep if you don't know it already and use it to search for the main function and everything else that interests you. You might also want to use code browsing tools like ctags or cscope which can also integrate with vim and emacs or use an IDE if you like that better.

+6  A: 

The GNU programs big and complicated. The size of GNU Hello World shows that even the simplest GNU project needs a lot of code and configuration around it.

The autotools are hard to understand for a beginner, but you don't need to understand them to read the code. Even if you modify the code, most of the time you can simply run make to compile your changes.

To read code, you need a good editor (VIM, Emacs) or IDE (Eclipse) and some tools to navigate through the source. The tar project contains a src directory, that is a good place to start. A program always start with the main function, so do

grep main *.c

or use your IDE to search for this function. It is in tar.c. Now, skip all the initialization stuff, untill

/* Main command execution.  */

There, you see a switch for subcommands. If you pass -x it does this, if you pass -c it does that, etc. This is the branching structure for those commands. If you want to know what these macro's are, run


there you can see that they are listed in common.h.

Below EXTRACT_SUBCOMMAND you see something funny:

read_and (extract_archive);

The definition of read_and() (again obtained with grep):

read_and (void (*do_something) (void))

The single parameter is a function pointer like a callback, so read_and will supposedly read something and then call the function extract_archive. Again, grep on it and you will see this:

  if (prepare_to_extract (current_stat_info.file_name, typeflag, &fun))
      if (fun && (*fun) (current_stat_info.file_name, typeflag)
      && backup_option)
    undo_last_backup ();
    skip_member ();

Note that the real work happens when calling fun. fun is again a function pointer, which is set in prepare_to_extract. fun may point to extract_file, which does the actual writing.

I hope I walked you a great deal through this and shown you how I navigate through source code. Feel free to contact me if you have related questions.

+1 for detailed example. BTW, this is also an example of how simple things can be made compicated. For some reason, some people think it is fun :-/
+3  A: 

Why not download the source of the coreutils (http://ftp.gnu.org/gnu/coreutils/) and take a look at tools like yes? Less than 100 lines of C code and a fully functional, useful and really basic piece of GNU software.

Greg S

I suggest using ctags or cscope for browsing. You can use them with vim/emacs. They are widely used in the open-source world.

They should be in the repository of every major linux distribution.

Iulian Şerbănoiu