views:

1370

answers:

12

What techniques can anyone suggest for understanding a new piece of code?

Here are some of the things I find sort-of work:

1) Print the code out, on a colour printer, with syntax highlighting, single sided. Spread it out on a desk. Then go through it: a) Highlighting start of functions and 'blocking off' large statements. b) Highlighting significant calls throughout e.g. calls out to other code or functions that interact with the real world (e.g. move camera lens). c) Flagging bits that really don't make sense. d) Adding handwritten comments in the margin explaining in general terms what a block of code does.

2) Simultaneously, have an editor open with the code available and a good grep tool to hand. When something needs explaining in the hardcopy, search for it here.

3) Simultaneously have the code running in a debugger (if you are lucky enough to have complied and working code) and make use of the call stack to see what's going on.

Problems with my approach:

1) I get distracted going through the code from start to finish and spend too long looking at trivia rather than finding the core functions.

2) I spend too long writing down long function names on pieces of paper and end up with a scrawled mess. For any badly structured piece of code the paper is a bit useless.

3) I feel like an idiot with my desk covered in highlighted pieces of paper.

Any tips or useful tools for analysing the code structure? I quite like paper since you don't spend so long mucking about with 'making the boxes line up pretty' but I wonder if I couldn't do better.

I strike this problem on average twice a year when I get handed a new piece of code to modify or maintain. It must be pretty common out there - after all, how often do we get the luxury of writing code from scratch?

+11  A: 

See How do you find your way around a new codebase and Good ways to get up to speed on a complex project.

Galwegian
Sorry... I did search for this first, but failed to find it
CressNZ
+1  A: 

If the code is one of the supported languages, run it through Doxygen. That's saved my life several times recently on coming up to speed with legacy code.

Peter K.
+5  A: 

What I do.

Write unit tests for pieces of it. You start with trivial detail you can unit test and then summarize. Eventually, you work your way up from small things to bigger and bigger things.

S.Lott
+3  A: 

The first thing I always do is start reading test cases. If they don't have any tests I write some (after I yell at them for not having any tests written).

Bill the Lizard
+10  A: 

I seem to remember reading in Martin Fowler's Refactoring that he would comprehend code by refactoring it as he read. I think that fits the general principle that one becomes most familiar with code by working on it, with the additional benefit of improving the quality of the code base.

marijne
I just posted something similar, so of course this comment must be genius. :-) Upvoted.
T.E.D.
Even if it doesn't need to be refactored you can add comments about blocks of code to help the next guy out.
Bryan Anderson
+13  A: 

When working with production code the first thing that I always do is to get the current code built and working - and verify that it is as per the production system.

Always aim to understand the minimum possible, and assume that the rest of the code is working as it should (even if it isn't - it's the mental process that is important).

Try to understand the big stuff first - how each part of the code hangs together and what is being achieved at a high level, then work downwards.

A large sheet of paper and a pack of 10cm^2 post-it notes always help me along. Each post-it is a component of the system, and I can move them to aid understanding.

Richard Harrison
A: 

Colleague here also suggested using a tool called c-tags http://ctags.sourceforge.net/ to speed up navigating the codebase.

CressNZ
A: 

I did once with a very old Bridge game written in an old dialect of C. I started writing a few shell scripts to build the call tree with line count for each function (didn't have any commercial tool at the time), identified the 'leaf' functions and worked up from there until I found the "big" one.

I printed that on continuous paper and covered a wall. (Yes, 2 mt height, 12 columns of paper, most of that was a single function.)

Then, pencil in hand looked at it, speaking aloud and drawing on top. A couple of weeks later, it was working on a new compiler, and I went on to write a new environment surrounding it. I threw everything else, it was easier to rewrite after figuring out the call interface function and a few others assumed.

Javier
+3  A: 

I guess it depends on the quality of the code - if the code is really low quality, sometimes it seems like nothing you do can help you understand it better. But most of the time, if it is code that I'm "taking over", I'll start by adding comments to it as I go through it. If I'm lucky, the file has enough comments in it that understanding what it is doing is simple. But I figure by adding comments as I go, I'm at least helping out whatever poor sap has to take it over after me. And what better time is there to add comments anyway?

After I'm commented each individual section, I find its easier to see how they all piece together.

Sam Schutte
A: 

What I usually do if it is particularly difficult is load the various source files into emacs, make sure colorization is turned on, and postscript print the buffers on a color printer. If you need to see large chunks of code at once, and quickly flip around, nothing beats good old-fashioned paper.

For particularly gnarly code, sometimes I will refactor it. Even if you don't actually check in your changes, the act of refactoring code forces you to understand what is really going on at a depth that can't be matched any other way.

Just start by saying to yourself, "OK, this bit is confusing. How would I make it clearer? Does it still do everything it did before if I do that? If not, is the difference important to the program somehow?".

T.E.D.
+1  A: 

I found Diomidis Spinellis's book: "Code Reading: The Open Source Perspective" (Addison Wesley, 2003. ISBN 0-201-79940-5)(http://www.spinellis.gr/codereading/) helpful for providing techniques and practice. The website contains additional material and links to other resources.

mas
+1  A: 

When I jump into a new software ecosystem, I find it helpful to list waypoints. These serve as guides as I learn more about the source code. Before attempting to understand what all of the code does, I skim the source by searching for connections.

  1. Choose N sufficiently sized for your analysis. I like N=10.

  2. Compile a list of the top N referenced "things" that matter to you (functions, global variables, structs, classes, etc).

  3. Compile a list of the top N included files.

  4. Compile a list of the top N most-revised files. Hopefully you have the revision history.

This nets you:

  1. The most-used "things". If the project uses a custom number class, for example, this list will find it.

  2. A list of files comprising the low layer interfaces. If you don't get low layer interfaces, then the source is likely spaghetti code.

  3. A list of which source files are the most mature.

Mature files generally should have fewer bugs, as they have been reviewed or read by more developers (or at least the same developer multiple times). A file that was written once and never modified is more likely to contain bugs. This isn't to say that mature files can't contain bugs - just that it becomes less likely.

The lists for #1 and #2 serve as a map of the system and help in navigating and finding errors. Debugging is also extremely useful, but it can be hard to traverse all of the code paths - building a list takes much less time.

Will Bickford