Let's say you want to start contributing to an open source project with thousands LOC. I am interesting in ways/suggestions on how you would start learning/hacking the new system.
Skim through the code, looking for places where you think you know what is occuring. Read through those sections to see if your initial thought holds up, and then try modifying it to see if you can change it to do something else.
I'd suggest that this be guided slightly by the project roadmap so you can then move on to spotting where new features and bug fixes are so you can contribute in a meaningful manner.
Also, read the documentation and any spin-up documents that may exist (functional specs, requirement specs, etc). This can further help you learn the code ropes quicker :)
Find a small spot where you feel comfortable and change it. Ignore the rest; you'll get to learn about it when you need it. Don't feel overwhelmed by the size; every program started with the first line and if it wasn't organized in small, independent bits, it would be impossible to maintain. So there is always a niche for you.
If you want books that I found useful for learning how to learn from the code, I would check out Diomidis Spinellis's books Code Reading: The Open Source Perspective and Code Quality: The Open Source Perspective.
For your specific question, I would start with Code Reading. But both are good books.
Learn with a purpose. I think you'll learn best if you have in mind some goal -- fix this or add that functionality. Start looking for the likely places where you will need to make your changes. Follow threads of control backwards to find out how to get to that point in the code. Also, take a "bird's eye" view of the code -- look at the layout and structure. Good code will have descriptive names that tells you want each class and method is for. See if you can recognize implementation patterns and see where/how they are used. Don't put too much stock in the documentation -- documents can be excellent, but they are often out of sync with what the code really does. Let the code itself be the best documentation.
If you're familiar with the project and have some idea of what you want to contribute, then get the latest version of the code and make the change that you want to make. Don't try to digest the whole body of code at one time. Look for patterns and conventions that are being used and try to adhere to them as much as possible.
I prefer to "step" through the code line by line in an IDE debugger. It is not practical to try to reach all code in a large system, but I start by debugging the start-up of the application and then move on to other areas of code that look interesting/important to the application.
I would suggest finding a feature that you really want to add.
I've only ever really looked through a mountain of someone else's code once, and it was made much easier because the feature I wanted to add was something I really needed. I think it's the reason most open source projects start anyway; to fill a need.
Rob
- Get the code building and running on your dev box.
- Figure out how the system is used and familiarize yourself with the libraries used. (For example, if you see code with calls to some OpenGL lib, familiarize yourself with that library) Read the system's documentation.
- I would then find a point of code that will run based on some input. So let's say the system generates output based on some input. I would find the method that initiates the process. Now try to either step through it (could be a pain) or just read the code and try to get a high level overview of what is going on.
I've found that step 3 usually becomes very involved and it is what gives me direction in how to read the code and learn how the system works.
Adding a feature can be a nice idea, but in a large system it could be a pain to even figure out what that feature is. Not only that, any feature you write to learn the system will most likely end up looking like crap because you would have no idea what is where and how the system operates. The feature you want to add though can certainly lead you to the right point in #3 so don't discount that.
-s
Thank you for your answers. Lot's of useful information.
Some tools I can think are:
- grep
- ctags
- cscope
- javadoc (Java)
- DOxygen (C,C++)
- lxr (software toolset for indexing and presenting source code repositories)
Any more suggestion on useful tools?