A few years ago I successfully converted a legacy 300,000 LOC 23-year-old code base to camelCase. It took only two days. But there were a few lingering affects that took a couple of months to sort out. And it is an very good way to annoy your fellow coders.
I believe that a simple, dumb, sed-like approach has advantages. IDE-based tools, and the like, cannot, as far as I know:
- change code not compiled via #ifdef's
- change code in comments
And the legacy code had to be maintained on several different compiler/OS platforms (= lots of #ifdefs).
The main disadvantage of a dumb, sed-like approach is that strings (such as keywords) can inadvertently be changed. And I've only done this for C; C++ might be another kettle of fish.
There are about five stages:
1) Generate a list of tokens that you wish to change, and manually edit.
2) For each token in that list, determine the new token.
3) Apply these changes to your code base.
4) Compile.
5) Double-check via a manual diff, and do a final clean-up.
For step 1, to generate a list of tokens that you wish to change, the command:
cat *.[ch] | sed 's/\([_A-Za-z0-9][_A-Za-z0-9]*\)/\nzzz \1\n/g' | grep -w zzz | sed 's/^zzz //' | grep '_[a-z]' | sort -u > list1
will produce in list1:
st_atime
time_t
...
In this sample, you really don't want to change these two tokens, so manually edit the list to delete them. But you'll probably miss some, so for the sake of this example, suppose you keep these.
The next step, 2, is to generate a script to do the changes. For example, the command:
cat list1 | sed 's/\(.*\)/glob_sub "\\<\1\\>" xxxx_\1/;s/\(xxxx_.*\)_a/\1A/g;s/\(xxxx_.*\)_b/\1B/g;s/\(xxxx_.*\)_a/\1C/g;s/\(xxxx_.*\)_t/\1T/g' | sed 's/zzz //' > list2
will change _a, _b, _c, and _t to A, B, C, and T, to produce:
glob_sub "\<st_atime\>" xxxx_stAtime
glob_sub "\<time_t\>" xxxx_timeT
You just have to extend it to cover d, e, f, ..., x, y, z,
I'm presuming you have already written something like 'glob_sub' for your development environment. (If not, give up now.) My version (csh, Cygwin) looks like:
#!/bin/csh
foreach file (`grep -l "$1" */*.[ch] *.[ch]`)
/bin/mv -f $file $file.bak
/bin/sed "s/$1/$2/g" $file.bak > $file
end
(Some of my sed's don't support the --in-place option, so I have to use a mv.)
The third step is to apply this script in list2 to your code base. For example, in csh use source list2
.
The fourth step is to compile. The compiler will (hopefully!) object to xxxx_timeT
. Indeed, it should likely object to just timeT
but the extra xxx_
adds insurance. So for time_t you've made a mistake. Undo it with e.g.
glob_sub "\<xxxx_timeT\>" time_t
The fifth and final step is to do a manual inspection of your changes using your favorite diff utility, and then clean-up by removing all the unwanted xxx_
prefixes. Grepping for "xxx_
will also help check for tokens in strings. (Indeed, adding a _xxx suffix is probably a good idea.)