When having a new C++ project passed along to you, what is the standard way of stepping through it and becoming acquainted with the entire codebase? Do you just start at the top file and start reading through all x-hundred files? Do you use a tool to generate information for you? If so, which tool?
views:
514answers:
19- Start working on it, perhaps by adding a small feature.
- Step through application startup in the debugger.
I use change requests/bug reports to guide my learning of some new project. It never makes a lot of sense to me to try and consume the entirety of something all at once. A change order or bug report gives me guidance to focus on this one tendril of the system, tracing it's activity through the code.
After a reasonable amount of these, I can get a good understanding of the fundamentals of the project.
You could try running it through doxygen to at last give a browsable set of documentation - but basically the only way is a debugger, some trace/std::cerr messages and a lot of coffee.
The suggestion to write test cases is the basis of Working-Effectively-Legacy-code and the point of the cppunit test library. If you can take this approach depends on your team and your setup - if you are the new junior you can't really rewrite the app to support testing.
There is one tool I know about that may help you, it's currently in beta called CppDepend that will help you understand the relation between the classes and the projects in the solution.
Other than that you can try to understand the code by reading it:
- Start with the header (.h/.hpp) files, reading them would help understand the "interfaces" between the classes
- If the solution has several project try to understand the responsibility of each project.
- Find someone who is familiar with the project that could give you and overview, 5 min with the right person can save you an hour with the debugger
Firstly how large is large?
I don't think you can answer this without knowing the other half of the scenario. What is the requirement for changing the code?
Are you just supporting/fixing it when it goes wrong? Developing new functionality? Porting the code to a new platform? Upgrading the code for a new C++ compiler?
Depending on what your requirement is I would start in different ways.
Browse around in the file hierarchy with Total Commander, try getting an overview of the structure. Try identify where the main header files are located. Also find the file where the main() function is located.
Here's how I approach the problem
- Start by fixing easy bugs. Do extreme dilligance on these bugs and use the debugger heavily to find the problem
- Code review every change that goes into the system. On an unbelievably large system, pick a smaller subset and review all of these changes
- And most importantly: Ask a lot of questions!
Ask a person who is already familiar with the codebase to outline the basic concepts that were used during development.
He doesn't need to explain every detail, but should give you a rough idea of how the software works and how the individual modules are connected with each other. Additionally, what I've found useful in the past was to first setup a working development environment before starting to think about the code.
Here's my general process:
- Start by understanding what the application does, and how its used. (I see way too many developers completely skip this critical step.)
- Search for any developer documentation related to the project. (However, realize this will nearly always be wrong and out of date - it just will have helpful clues.)
- Try to figure out the logic in the organization. How is the main architecture defined? What large scale patterns are used? (ie: MVC, MVP, IoC, etc)
- Try to figure out the main classes related to the "large" objects in the project. This helps for the point above.
- Slowly start refactoring and cleaning up as you try to maintain the project.
Usually, that will get me at least somewhat up to speed. However, usually I end up given a project like this because something has to be fixed or enhanced, and timing isn't always realistic, in which case I often just have to jump in and pray.
Understanding how the code is used is usually very helpful.
If this is a library, look at client code and unit tests. If there aren't any unit tests, write some.
If this is an application, understand how it works - in detail. Again read & write unit tests.
Essentially, it's all about the interfaces. Understand the the interfaces and you'll go a long way towards understanding how the code works. By interface, I mean, the API if it's a library, the UI if it's a graphical application, the content of the inbound & outbound messages if it's a server.
Read the documentation. If possible, speak with the former maintainer. Then, check out the code bases from the first commit and the first release from the VCS and spend some time looking at them. Don't go for full understanding yet, just skim and understand which are the major components and what they do. Then read the change logs and the release notes for each of the major releases. Then start breaking everything and see what breaks what. Do some bug fixes. Review the test suite and understand which component each test is focused on. Add some tests. Step through the code in a debugger. Repeat.
Things to do:
- Look at what the sales brochure tells you it does, set the scope of your expectations
- Install it, what options do you have in the installer, read the quick start/install guide
- Find out what it does, does it even execute, do you have multiple executables
- Is there a developer setup guide/wiki, pointers to VCS
- Get the code and make your build environment work, document SDKs, build tools you need if it isn't already
- Look at the build process, project dependancies, is there a build machine/CI service
- Look at generated doc output (if there is any!)
- Find an interesting piece of the solution and see how it works, what are the entry points/ how does it work/look for main classes and interfaces
- Replicate bugs, stop at interesting features in the program to get an overview and work down to tracing code.
- Start to fix things, but ensure you are fixing things by having appropriate unit tests to show that it is broken now and when it will be fixed.
I have been incorporating source codes from some mid-sized projects. The most important lesson I learn from this process is before going into the source codes, you must be sure what part of the source codes interest you most. You should then go into that piece by grepping logging/warning messages or looking at class/function names. In understanding the source codes, you should run it in a debugger or insert your own warning messages. In all, you should focus on things you are interested in. The last thing you want is to read all the source codes.
Try generating a documentation using Doxygen or something similar if it wasn't done already.
Walk through the API and see if there is something that is unclear to you and look at the code, if you still don't get it ask a developer who already worked on it before.
Always examine whatever you have to work on first.
Take a look at whatever UML documents you've got, if you don't have any:
- Smack the developer/s who worked on it. It's a shame they didn't do something as basic as UML class diagrams.
- Try to generate them from the code. They will not be accurate but the they will give you a head start.
If there is something specific that you don't understand or think is wrong, ask the team who developed it. They will probably know better.
As already said, grab doxygen and build HTML documentation for source code.
If code is well-designed, you'll easily see a nice class hierarchy, clear call graphs and many other things that otherwise would take ages to uncover. When certain parts behavior appears unclear, look at the unit tests or write your own.
However, if the structure appears to be flat, or messy, or both together, you may find yourself in some sort of trouble.
I'm not sure there is a standard way. There are some for-pay tools that will do C++ class diagrams/call graphs and provide some kind of code-level view. doxygen is a good free one. My low-tech approach is to find the top-level file and start to sort through what it provides and how...taking notes if needed.
In C++, the most common problem is that a lot of energy and time is wasted on low level tasks, such as "memory management".
Things that are no - brainers in managed languages are a pain to do in C++.