views:

809

answers:

13

I understand the value of reading source code, and I am trying my best to read as much as I can. However, every time I try getting into a 'large' (i.e. complete) project of sorts, I am overwhelmed.

For example, I use Anki a lot when revising languages. Also, I'm interested in getting to know how an audio player works (because I have some project ideas), hence quodlibet on Google Code.

But whenever I open the source code folders for the above programs, there are just so many files that I don't know where or what to begin with. I think that I should start with files marked init.py but I can't see the logical structure of the programs, or what reasoning was applied when the original writer divided his modules the way he did.

Hence, my questions:

  1. How/where should I begin reading source? Any general tips or ideas?
  2. How does a programmer keep in mind the overall structure and logic of the program, especially for large projects, and is it common not to document that structure?
  3. As an open source reader, must I look through all of the code and get a bird's eye view of the code and libraries, before even being able to proceed?
  4. Would an IDE like Eclipse SDK (with PyDev) help with code-reading?

Thanks for the help; I really appreciate your helping me.

+9  A: 

I think it is easy to get overwhelmed by the amount of source code that an open source project has. But that may happen even for a large product of some company. The way for you to start understanding and reading such a large source code, in my opinion, is to understand that you will never know everything in details, but you may have an overall understand of the project, and then dig the details in some specific area that you have to work with. If you have a project that is composed of multiple modules, this is how I would start:

  1. If your project is composed of several modules, try to understand for what each module is designed;
  2. Pick one of the modules, and understand what the main packages inside that module are responsible for;
  3. When you find out what you want, dig into that package and learn/ do what you want.

You have to divide and conquer. For example, have an objective - if you look at the Android source code, for example, you may want to understand how it deals with the gravity sensor. You will have to learn the general structure of the program, drill down to the specific module that is responsible for that, and then drill further from that.

I don't think any human soul is capable of understanding every bit and piece of a very large product. And you don't need to, as long as you have a birds eye view of what the major pieces are.

Ravi Wallau
"I don't think any human soul is capable of understanding every bit and piece of a very large product." True. This is true even if you wrote the large project yourself - you can't keep it all in your head at once. A lot of good programming practices are intended to manage complexity for this reason. Write an object or module well, and you can forget how it works for a while and just use the interface. Toaster.toast(bread) - I don't have to remember exactly what the toaster object is doing once it works. If I need to revisit it, I'll figure it out again (and hopefully I wrote it clearly).
Nathan Long
+1  A: 

Start from the code's documentation i.e. module design documents i.e. data flow diagram, flow charts, sequence diagram or uml diagrams, scenarios, use cases... etc if they are available. in most cases, they are not. so end you'll end up reading the code... read the code... that is why documenting the different phases of your development is important so that others who does not anything about what you did will be able to grasp the essence of your implementation easier.

ultrajohn
+2  A: 

How/where should I begin reading source? Any general tips or ideas?

Start with something simple, a tutorial, example code with lots of documentation, etc... You will have to learn how to read the code first After that, you will have to learn how to read the structure.

How does a programmer keep in mind the overall structure and logic of the program, especially for large projects, and is it common not to document that structure?

That, unfortunately, varies a great deal per programmer. With large projects there usually would be a diagram displaying the connections between the modules. However, developers could also choose not to create a diagram, but simply document the structure.

Or... the developers might be very lazy and/or short on time and will not document anything.

As an open source reader, must I look through all of the code and get a bird's eye view of the code and libraries, before even being able to proceed?

It would depend on what your goal is. Most of the time it would not be needed to understand the entire program structure to simply add a function to a program. For example, if you would want to change how the random algorithm in quodlibet works you could simply look for random in the source and go from there.

Compare it to a roadmap, do you need to see the roadmap of the entire city before you can go from point A to point B?

Would an IDE like Eclipse SDK (with PyDev) help with code-reading?

It could make browsing the code a lot easyer. With PyDev you can simply click on the module/function you want to open. That could help you analyze the code a lot faster.

WoLpH
+1  A: 

I usually like to pick one feature of the code. What is something the code does that you're curious about and want to know how it works? I'll go through the code until I find the answer to that question. Usually by the time I find the answer I've dug through enough code to where I have other questions. Rinse and repeat :)

Eric
+3  A: 

You should begin to read code with a debugger. It is extremely powerful to be able see all declared variables, and watch them change as you step though the code. Master your debugger, it will become a better programmer.

Rook
I'm a still a bit clueless with how to use a debugger currently; I don't know where to start yet. But I'll definitely try it out.
anonnoir
@anonnoir They can be a bit intimidating at first, but in makes life A LOT easier and i wish i had learned how to use one at the very beginning.
Rook
+5  A: 

This is a question that I have been thinking about a lot recently (I was thinking about asking a very similar question), for I have recently started working (first job after uni) and have been spending a lot of time reading code to work out how the code works.

I find it often best to focus on one particular functionality of a program and work out what the program does to do it. First you need to find the function in the program that does it, and then work out the stack trace (having a debugger here is handy).

Keeping notes about what you learn is very helpful. You might want to note down the stack traces that you make, as well as a few details on important classes, and anything else that seems useful.

An IDE can be very helpful as it can allow you to navigate the code very quickly (however, I'm not familiar with PyDev). For example, you can generally find methods that are being called by simply clicking on the method and pushing F3 (or something). If you don't have a powerful IDE, using something like grep or ack is a convenient way of finding functions or variable declarations etc.

In my limited experince, the structure of a program is generally not documented. This, in my opinion, is quite unfortunate.

I don't have a complete answer, but I hope this helps.

David Johnstone
+1 for ack. It's awesome for finding things without an IDE
prestomation
+1  A: 

There'll likely be a lot of opinions on this but here's my take. Right off the bat, downloading a projects source and trying to read it is going to overwhelm a lot of developers. What languages do you write in? And more important than the language (because if you can read a for loop in java you can likely read it in C#, Python, javascript, php etc.), what toolkit/framework is the project using. Take Asp.Net Webforms for example. Without having a general idea how Webforms work, the code that runs it won't make much sense.

One of the things I would really suggest is using the program extensively so that you know where everything is and what it does.

No let me take a shot at your questions

1) While most programs begin with a "Main" method, if you start from there you'll instantly get lost. I think it makes sense to start with examples for an addin/plugin framework. Those are usually easy for new developers to start with. Failing that, pick your favourite button and look at the code that handles that event.

2)I don't know many programmers that can keep the whole thing in their head. One of the things i've heard about Mercurial is that developers can keep the whole structure in their head, but you still have to be fairly experienced to do that.

3)Good documentatino sometimes has structure, but don't bank on it.

4) I like using an IDE. They highlight syntax, which is great when learning a language, they offer tooltips, and sometimes right clicking a method brings you to the definition.

You've got the right attituede. stick with it.

pnewhook
Thank you for the encouragement!
anonnoir
+1  A: 

I do this all of the time. I worked primarily in Java and very often I will have to open up a 3rd party library to see what is going on, if I'm doing something wrong if there is a bug in their code. I also have to jump into many different code bases across a few different project teams so I believe I have gotten pretty good at it. My approach is a bit different depending on the project, but usually it starts with a question, "How is this used?". If it is a web application, my first goal is to get it up and running locally and take it for a spin. If it is a library, than I take a look at it's API (JavaDocs or whatever, samples etc..). I also try to seperate out functionality. For instance, most of the projects I work on use Spring or another MVC framework, so identifying where everything is is the next part. I try and avoid getting down into the nitty gritty details until the end. This is my approach and is something that works well for the type of work I do. Being a new developer, you will simply have to find what works best for you.

If it's a stand alone application, start with the entrance point and work from there. As the previous posters have said, use a debugger to step through to see the flow.

Last piece of advice, don't worry about it. Dive in and just spend time with it, it's not all going to make sense right away, it will take time, just get started.

Casey
+1  A: 

While it's a tedious task, it can be a lot easier if you follow this advice:

  1. Understand the architecture. This often entails understanding the framework (example ASP.NET MVC), patterns (example IoC), practices (example SOLID) as well as the development platform. MVC has a specific approach, ASP.NET applications follow a particular workflow and JAVA applets have a fixed entry point.

  2. Have a look at the commit logs if you have access to them. Look at the commits for a feature implementation or major bug fix. I often point developers to a commit log so they can implement a related feature or extend an existing one. It never fails them.

  3. Take a top-down approach (as opposed to bottom-up). If you see a SAVE button, check out what event it raises and what function handles the event. Then drill down into the function and observe the application logic, hooks, database calls and so on. Find a good debugger to step through the code and use a good IDE to check cross-references in code.

aleemb
+1  A: 

One of the most important things to remember is that with a large and complex piece of code you will never understand every aspect of it.

  1. How/where should I begin reading source? Any general tips or ideas?

    Small tutorials and examples are definately a good start. Even taking a large open source project and focusing on a particular module is a good start once you understand the basics behind the code.

  2. How does a programmer keep in mind the overall structure and logic of the program, especially for large projects, and is it common not to document that structure?

    This can be really difficult to track, especially if you did not develop the application from the start. Typically there will be some documentation, especially with collaborative projects, but a lot of the programmers I have worked with just sort of expect you to understand why the code is designed the way it is. In terms of large projects, eventually you reach a stage where you may not know where something is, but you know how to find it or where it should go.

  3. As an open source reader, must I look through all of the code and get a bird's eye view of the code and libraries, before even being able to proceed?

    I would say no. I am a firm believer in the 'Black Box' method of coding when it comes to big projects. A lot of the time trying to understand all of the code will slow you down and confuse you. Generally if you know what a function takes and what it returns, you can start coding on something useful, and with time you figure out what it is doing in the functions.

  4. Would an IDE like Eclipse SDK (with PyDev) help with code-reading?

    IDE's can help with things like syntax highlighting, and something like Visual Studio is great with the options you can use to find definitions and references (I'm not sure if Eclipse can do this, I havent touched it since university).

Ryan French
+2  A: 

Start with the documentation if any. Of the coder, if not. If neither of those is available, then look for tools which will analyse the code for you and show you the structure (function call tree, for instance, or class tree).

I recommend running DoxyGen over it http://www.stack.nl/~dimitri/doxygen/diagrams.html

Mawg
A: 

Use decent IDE with reference browsing and syntax highlighting.

sasayins
+1  A: 

I like to start with a bird's eye view of the project before drilling down to the actual code. This helps me keep everything in perspective and see how the pieces fit together.

For example, there are tools that will generate UML from Python.

The first example I found is commercial (and not cheap). If you're making your living maintaining code written by someone else, it's well worth the cost. Otherwise, there are other tools available cheaper or even free to do the same thing, maybe just not as pretty.

Bruce McGee