views:

90

answers:

6

I am looking for some good tips to track and understand huge codebase. I usually start at the top and end up getting lost in some nitty-gritty details of a function after a while. Since I would have already been many levels deep, the process of backing up and getting on track is tiresome and exhausting. How do you keep track of the trail when you are trying to understand huge codebase?

I usually have a notepad open and try to track the steps. But switching between understanding the code and taking a notes is not really effective for me. Any tips?

EDIT: I am looking at a situation where I want to fix a bug. I am skeptical that if I limit my understanding to the function/class where the bug is present, I will not be confident about my fix.

A: 

Usually something I try to do is to avoid trying to understand the code from the top up. I usually look around all the classes and packages in the code and see which ones stand out as something I might be interested in looking at further. Focus on understanding how that small piece works by itself.

I then move on to another piece of code, etc, and hopefully, after enough time, I enderstand how all the pieces work which makes understanding the big picture much easier.

Shynthriir
A: 

I start with concepts/logics in the code. Take one fundamental concept/logic and follow it throw and realize how the developer tried to do it. In the process I always find associated details and then later study on those parameters.

Once you have basic model of the code and some idea of how the developer thought about it, you can take it from there. Always worked for me :)

EDIT: Also if code is very large in size. Modeling at higher level helps. Divide the code in modules and understand how they are connected to each other. Later dive into the modules individually and follow the trick I mentioned above.

Faheem
+3  A: 

First answer the question: What do you want to do?

Possible questions

  1. Do you want to evaluate the Design/Architecture?

  2. Do you want to fix a bug?

  3. Implement a new feature?

Possible approaches:

  1. Get hold on some static analysis tools: Sonar, Structure 101 are examples. Use those to get an overview of the architecture.

  2. Start with a test of the bug (idealy a UnitTest, but a session in the debugger will do). Start following the debugger. Don't go to deep. Check the values of variables for unexpected values.

  3. Look for related feature, search for those by name and see how they are implemented. Ignore all the details that don't relate to the task at hand.

---- addition in response to the edition of the question ----

Doing a bug fix in a code base you don't know (and which probably doesn't have extensive automatic tests) is always a risky business.

Still I think the general approach presented above is advisable. Of course it should be 'protected' by tests:

  • once you have identified the area where you have to make a change, check who is using this code and in what way. Carefully adding logstatements and running the application might do the trick.
  • write test to document the current behavior (those should be green and stay so)
  • write tests that document the changed behavior after your change (those start red)
  • make your change. This should make the previous tests green

  • run manual tests to make sure the application works as intended.

As usual the amount of testing depends on the risk that comes with missing a bug.

Jens Schauder
Thanks Jens. Edited the question for my specific scenario.
Ravi Gummadi
I like this answer. Get in and get out - do analysis but don't get bogged down in the detail.
Neil Trodden
A: 

I am skeptical that if I limit my understanding to the function/class where the bug is present, I will not be confident about my fix.

fix the bug, and if your fix breaks something else or isn't enough to fix it, blame the code author for not writing maintainable code.

you shouldn't have to understand everything to fix one piece.

Beth
Blaming isn't just going to work when the original authors are long gone and you shoulder the responsibility. :)
Ravi Gummadi
no, but if you have to understand the whole thing to fix it, you might as well rewrite it. start by fixing it where it is and then add more resource understanding more code when it's proven you need to.
Beth
@Beth I've heard that argument. About a 350,000 line code base. With a six month deadline.
David Lively
+2  A: 

There is an interesting SE Radio interview with "Pragmatic" Dave Thomas about code Archeology, about just this topic.

Some ideas, some from that talk, some not:

Do you have access the VC repo? What are the hot spots where lots of changes occurred? This gives you a hint about where lots of development time was spent.

What is the biggest file. Unfortunately code tends to accumulate where it's used and without work to split it up again it stays there. The biggest file is often the most important one too.

Is there a bug tracker? What components have the most bugs, this also tells you where problems occur (and probably where development has been concentrated due to that logic being important.)

A good IDE makes tracking a lot easier as you can jump to definitions and back again.

A documentation generator, even if it there aren't any comments, can often make good graphical representations of classes or function calls that guides you to the right place.

Paul Rubel
+1  A: 

There's a non-linear (sort of hackerish, sideways and admittedly unprofessional) way to do this - a kind-of follow the breadcrumbs approach:

  • choose any line of code and read on until you find some (say) function or class that grabs your attention;

  • copy its name and mark the block with a comment ('found: [name of thing]', incrementally adding each thing you follow);

  • then search for every instance of this word throughout the code;

  • you'll find the actual 'thing' on the way, so make a note of the line where it appears, and what it does.

After you've done this awhile (if the method works for you) the thinking behind the code becomes apparent and you'll hopefully locate all the main connections quite quickly.

In the worst cases, I've also searched & replaced all instances of poorly-named vars, subroutines, etc. to something that makes more descriptive sense (then run the code again).

Of course (like Paul says) if you use an editor or IDE that can list defined stuff, you're already halfway there :-)

Dave Everitt