views:

122

answers:

4

If you've got a codebase which is a bit messy in respect to coding standards - a mix of different conventions from different people - is it reasonable to give one person the task of going through every file and bringing it up to meet standards?

As well as being tremendously dull, you're going to get a mass of changes in SVN (or whatever) which can make comparing versions harder. Is it sensible to set someone on the whole codebase, or is it considered stupid to touch a file only to make it meet standards? Should files be left alone until some 'real' change is needed, and then updated?


Tagged as C++ since I think different languages have different automated tools for this.

A: 

I was thinking it's a task you might give a work-experience kid or put out onto RentaCoder

This depends mainly on the codebase's size.

I've seen three trainees given the task to go through a 2MLoC codebase (several thousand source files) in order to insert one new line into the standard disclaimer at the top of all the source files (with the line's content depending on the file's name and path). It took them several days. One of the three used most of that time to write a script that would do it and later only fixed the files where the script had failed to insert the line correctly, the other two just ploughed through the files. (The one who wrote the script later got a job at that company.)

The job of manually adapting all those files in that codebase to certain coding standards would probably have to be measured in man-years.
OTOH, if it's just a few dozen files, it's certainly doable.

Your codebase is very likely somewhere in between, so your best bet might be to set a "work-experience kid" to find out whether there's a tool that can do this to your satisfaction and, if so, make it work.

Should files be left alone until some 'real' change is needed, and then updated?

I'd strongly advice against this. If you do this, you will have "real" changes intermingled with whatever reformatting took place, making it nigh impossible to see the "real" changes in the diff.

sbi
For < 100kLoC projects, it is best to do a "fast code review", without changing the code, on the project by a team member (who must have at least one year of C++, in order to correctly apply the coding standard). During the review, make a summary of changes that are desired and estimate the amount of edit, then decide whether to go ahead with it. *(The reservation I have against trainees is that sometimes they do not have 100% understanding: for example, replacing every `char` with `wchar_t` or every `int` with `unsigned`, with whimsical results.)*
rwong
A: 

It also depends on what kind of changes you are planning to make in order to bring it up to your coding standard. Everyone's definition of coding standard is different.

More specifically:

  • Can your proposed changes be made to the project with 100% guarantee that the entire project will work identically the same as before? For example, changes that only affect comments, line breaks and whitespaces should be fine.
  • If you do not have 100% guarantee, then there is a risk that should not be taken unless it can be balanced with a benefit. For example, is there a need to gain a deeper understanding of the current code base in order to continue its development, or fix its bugs? Is the jumble of coding conventions preventing these initiatives? If so, evaluate the costs and benefits and decide whether a makeover is justified.
  • If you need to understand the current code base, here is a technique: tracing.
    • Make a copy of the code base. Note that tracing involves adding code, so it should not be performed on the production copy.
    • In the new copy, insert many fprintf (trace) statements into any functions considered critical. It may be possible to automate this.
    • Run the project with various inputs and collect those tracing results. This will help everyone understand the current project's design.
  • Another technique for understanding the current code base is to document the dependencies in the project.
    • Some kinds of dependencies (interface dependency, C++ include dependency, C++ typedef / identifier dependency) can be extracted by automated tools.
    • Run-time dependency can only be extracted through tracing, or by profiling tools.
rwong
+1  A: 

Should files be left alone until some 'real' change is needed, and then updated?

This is what I would do.

Even if it's primarily text layout changes, doing it by a manual process on a large scale risks breaking code that was working.

Treat it as a refactor and do it locally whenever code has to be touched for some other reason. Add tests if they're missing to improve your chances of not breaking the code.

If your code is already well covered by tests, you might get away with something global, but I still wouldn't advocate it.

I also think this is pretty much language-agnostic.

Don Roby
A: 

You can address the formatting aspect of coding style fairly easily. There are a number of tools that can auto-format your code. I recommend hooking one of these up to your version control tool's "check in" feature. This way, people can use whatever format they want while editing their code, but when it gets checked in, it's reformatted to the official style.

In general, I think it's best if you can do the big change all at once. In the past, we've done the following: 1. have a time dedicated to the reformatting when most people aren't working (e.g. at night or on the weekend 2. have a person check out as many files as possible at that time, reformat them, and check them in again

With a reformatting-only revision, you don't have to figure out what has changed in addition to the formatting.

kc2001
Which tools might these be? I'm not aware of any for C++ though I know Java has them.
John
I edited my post to add a link to tools. For some of them, e.g. Emacs, you might need to run a script with Emacs running in batch mode.
kc2001