views:

874

answers:

11

I have been working on some 10 year old C code at my job this week, and after implementing a few changes, I went to the boss and asked if he needed anything else done. That's when he dropped the bomb. My next task was to go through the 7000 or so lines and understand more of the code, and to modularize the code somewhat. I asked him how he would like the source code modularized, and he said to start putting the old C code into C++ classes.

Being a good worker, I nodded my head yes, and went back to my desk, where I sit now, wondering how in the world to take this code, and "modularize" it. It's already in 20 source files, each with its own purpose and function. In addition, there are three "main" structs. each of these structures has 30 plus fields, many of them being other, smaller structs. It's a complete mess to try to understand, but almost every single function in the program is passed a pointer to one of the structs and uses the struct heavily.

Is there any clean way for me to shoehorn this into classes? I am resolved to do it if it can be done, I just have no idea how to begin.

+3  A: 

With “just” 7000 lines of C code, it will probably be easier to rewrite the code from scratch, without even trying to understand the current code.

And there is no automated way to do or even assist the modularization and refactoring that you envisage.

7000 LOC may sound like much but a lot of this will be boilerplate.

Konrad Rudolph
+10  A: 

First, tell your boss you're not continuing until you have:

http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672

and to a lesser extent:

http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052

Secondly, there is no way of modularising code by shoe-horning it into C++ class. This is a huge task and you need to communicate the complexity of refactoring highly proceedural code to your boss.

It boils down to making a small change (extract method, move method to class etc...) and then testing - there is no short cuts with this.

I do feel your pain though...

David Relihan
+1 for the Feathers book. It's full of sanity-preserving advice.
Steven Sudit
+2  A: 

Try and see if you can simplify the code before changing it to c++. Basically though I think he just wants you to convert functions into class methods and convert structs into class data members (if they don't contain function pointers, if they do then convert these to actual methods). Can you get in touch with the original coder(s) of this program? They could help you get some understanding done but mainly I would be searching for that piece of code that is the "engine" of the whole thing and base the new software from there. Also, my boss told me that sometimes it is better to simply rewrite the whole thing, but the existing program is a very good reference to mimic the run time behavior of. Of course specialized algorithms are hard to recode. One thing I can assure you of is that if this code is not the best it could be then you are going to have alot of problems later on. I would go up to your boss and promote the fact that you need to redo from scratch parts of the program. I have just been there and I am really happy my supervisor gave me the ability to rewrite. Now the 2.0 version is light years ahead of the original version.

yan bellavance
+1 I agree with modularising the code before converting to C++
David Relihan
Better still, modularise it then leave it in C, unless there's a good reason to change language.
Mike Seymour
+21  A: 

Really, 7000 lines of code is not very much. For such a small amount of code a complete rewrite may be in order. But how is this code going to be called? Presumably the callers expect a C API? Or is this not a library?

Anyway, rewrite or not, before you start, make sure you have a suite of tests which you can run easily, with no human intervention, on the existing code. Then with every change you make, run the tests on the new code.

anon
its not just about the number of lines of code but how long and complicated they are. It depends on what he is doing. 7000 lines of code of 1 source can be equivalent to 70 000 of another.
yan bellavance
+1 for testing.
Bill
@Neil: The Feathers book that David recommended talks at length about "cover-and-code", which involves nailing down the current behavior with unit tests, then confirming that the changes to the code still pass the tests. Excellent advice!
Steven Sudit
Agreed, 7000 lines of C doesn't sound that bad; it should be possible for one person to wrap their head around it all and then get a clear idea of how to refactor it.
Ether
+4  A: 

Surely it can be done - the question is at what cost? It is a huge task, even for 7K LOC. Your boss must understand that it's gonna take a lot of time, while you can't work on shiny new features etc. If he doesn't fully understand this, and/or is not willing to support you, there is no point starting.

As @David already suggested, the Refactoring book is a must.

From your description it sounds like a large part of the code is already "class methods", where the function gets a pointer to a struct instance and works on that instance. So it could be fairly easily converted into C++ code. Granted, this won't make the code much easier to understand or better modularized, but if this is your boss' prime desire, it can be done.

Note also, that this part of the refactoring is a fairly simple, mechanical process, so it could be done fairly safely without unit tests (with hyperaware editing of course). But for anything more you need unit tests to make sure your changes don't break anything.

Péter Török
+2  A: 

It's very unlikely that anything will be gained by this exercise. Good C code is already more modular than C++ typically can be - the use of pointers to structs allows compilation units to be independent in the same was as pImpl does in C++ - in C you don't have to expose the data inside a struct to expose its interface. So if you turn each C function

// Foo.h
typedef struct Foo_s Foo;
int foo_wizz (const Foo* foo, ... );

into a C++ class with

// Foo.hxx
class Foo {
    // struct Foo members copied from Foo.c
    int wizz (... ) const;
};

you will have reduced the modularity of the system compared with the C code - every client of Foo now needs rebuilding if any private implementation functions or member variables are added to the Foo type.

There are many things classes in C++ do give you, but modularity is not one of them.

Ask your boss what the business goals are being achieved by this exercise.

Note on terminology:

A module in a system is a component with a well defined interface which can be replaced with another module with the same interface without effecting the rest of the system. A system composed of such modules is modular.

For both languages, the interface to a module is by convention a header file. Consider string.h and string as defining the interfaces to simple string processing modules in C and C++. If there is a bug in the implementation of string.h, a new libc.so is installed. This new module has the same interface, and anything dynamically linked to it immediately gets the benefit of the new implementation. Conversely, if there is a bug in string handling in std::string, then every project which uses it needs to be rebuilt. C++ introduces a very large amount of coupling into systems, which the language does nothing to mitigate - in fact, the better uses of C++ which fully exploit its features are often a lot more tightly coupled than the equivalent C code.

If you try and make C++ modular, you typically end up with something like COM, where every object has to have both an interface (a pure virtual base class) and an implementation, and you substitute an indirection for efficient template generated code.

If you don't care about whether your system is composed of replaceable modules, then you don't need to perform actions to to make it modular, and can use some of the features of C++ such as classes and templates which, suitable applied, can improve cohesion within a module. If your project is to produce a single, statically linked application then you don't have a modular system, and you can afford to not care at all about modularity. If you want to create something like anti-grain geometry which is beautiful example of using templates to couple together different algorithms and data structures, then you need to do that in C++ - pretty well nothing else widespread is as powerful.

So be very careful what your manager means by 'modularise'.

If every file already has "its own purpose and function" and "every single function in the program is passed a pointer to one of the structs" then the only difference made in changing it into classes would be to replace the pointer to the struct with the implicit this pointer. That would have no effect on how modularised the system is, in fact (if the struct is only defined in the C file rather than in the header) it will reduce modularity.

Pete Kirkham
"Good C code is already more modular than C++ typically can be" - That dosn't make sense to me. Surely if you group methods together with the same responsibility into a class you are improving modularity
David Relihan
You have confused modularity with coupling.
anon
@Neal who are you addressing?
Pete Kirkham
@Pete Kirkham You.
anon
@David no, it has no effect to the coupling between compilation units. Generally a C dll can append members to its structs, add extra 'private' helper functions or change its implementation completely without having to recompile client code. It is a well behaved module. You can't do that anything like as easily in C++ the default is to expose all the members in the module interface, giving a big gain in efficiency and use of automatic values for RAII, but less modularity.
Pete Kirkham
@Pete: You've made Neil's point. You said "without having to recompile client code" which addresses the issue of coupling.
Ben Voigt
@Ben Modularity is the absence of coupling. That is the widely adopted definition (along with high cohesion), at least in software engineering. If I was saying the more coupled system (C++ with TMP) is more modular, then I would have confused them.
Pete Kirkham
+18  A: 

This shoehorning into C++ seems to be arbitrary, ask your boss why he needs that done, figure out if you can meet the same goal less painfully, see if you can prototype a subset in the new less painful way, then go and demo to your boss and recommend that you follow the less painful way.

omermuhammed
+1 for bringing up the question: what is the _real_ goal of the exercise?
Jeremy Friesner
I would expect the goal of the exercise is to attack this problem: "Its a complete mess to try to understand".
Ether
+4  A: 

I guess that the thinking here is that increasing modularity will isolate pieces of code, such that future changes are facilitated. We have confidence in changing one piece because we know it cannot affect other pieces.

I see two nightmare scenarios:

  1. You have nicely structured C code, it will easily transform to C++ classes. In which case it probably already is pretty darn modular, and you've probably done nothing useful.
  2. It's a rats-nest of interconnected stuff. In which case it's going to be really tough to disentangle it. Increasing modularity would be good, but it's going to be a long hard slog.

However, maybe there's a happy medium. Could there be pieces of logic that important and conceptually isolated but which are currently brittle because of a lack of data-hiding etc. (Yes good C doesn't suffer from this, but we don't have that, otherwise we would leave well alone).

Pulling out a class to own that logic and its data, encpaulating that piece could be useful. Whether it's better to do it wih C or C++ is open to question. (The cynic in me says "I'm a C programmer, great C++ a chance to learn something new!")

So: I'd treat this as an elephant to be eaten. First decide if it should be eaten at all, bad elephent is just no fun, well structured C should be left alone. Second find a suitable first bite. And I'd echo Neil's comments: if you don't have a good automated test suite, you are doomed.

djna
+5  A: 

I think a better approach could be totally rewrite the code, but you should ask your boss for what purpose he wants you "to start putting the old C code into c++ classes". You should ask for more details

pcent
+25  A: 

First, you are fortunate to have a boss who recognizes that code refactoring can be a long-term cost-saving strategy.

I've done this many times, that is, converting old C code to C++. The benefits may surprise you. The final code may be half the original size when you're done, and much simpler to read. Plus, you will likely uncover tricky C bugs along the way. Here are the steps I would take in your case. Small steps are important because you can't jump from A to Z when refactoring a large body of code. You have to go through small, intermediate steps which may never be deployed, but which can be validated and tagged in whatever RCS you are using.

  1. Create a regression/test suite. You will run the test suite each time you complete a batch of changes to the code. You should have this already, and it will be useful for more than just this refactoring task. Take the time to make it comprehensive. The exercise of creating the test suite will get you familiar with the code.
  2. Branch the project in your revision control system of choice. Armed with a test suite and playground branch, you will be empowered to make large modifications to the code. You won't be afraid to break some eggs.
  3. Make those struct fields private. This step requires very few code changes, but can have a big payoff. Proceed one field at a time. Try to make each field private (yes, or protected), then isolate the code which access that field. The simplest, most non-intrusive conversion would be to make that code a friend function. Consider also making that code a method. Converting the code to be a method is simple, but you will have to convert all of the call sites as well. One is not necessarily better than the other.
  4. Narrow the parameters to each function. It's unlikely that any function requires access to all 30 fields of the struct passed as its argument. Instead of passing the entire struct, pass only the components needed. If a function does in fact seem to require access to many different fields of the struct, then this may be a good candidate to be converted to an instance method.
  5. Const-ify as many variables, parameters, and methods as possible. A lot of old C code fails to use const liberally. Sweeping through from the bottom up (bottom of the call graph, that is), you will add stronger guarantees to the code, and you will be able to identify the mutators from the non-mutators.
  6. Replace pointers with references where sensible. The purpose of this step has nothing to do with being more C++-like just for the sake of being more C++-like. The purpose is to identify parameters that are never NULL and which can never be re-assigned. Think of a reference as a compile-time assertion which says, this is an alias to a valid object and represents the same object throughout the current scope.
  7. Replace char* with std::string. This step should be obvious. You might dramatically reduce the lines of code. Plus, it's fun to replace 10 lines of code with a single line. Sometimes you can eliminate entire functions whose purpose was to perform C string operations that are standard in C++.
  8. Convert C arrays to std::vector or std::array. Again, this step should be obvious. This conversion is much simpler than the conversion from char to std::string because the interfaces of std::vector and std::array are designed to match the C array syntax. One of the benefits is that you can eliminate that extra length variable passed to every function alongside the array.
  9. Convert malloc/free to new/delete. The main purpose of this step is to prepare for future refactoring. Merely changing C code from malloc to new doesn't directly gain you much. This conversion allows you to add constructors and destructors to those structs, and to use built-in C++ automatic memory tools.
  10. Replace localize new/delete operations with the std::auto_ptr family. The purpose of this step is to make your code exception-safe.
  11. Throw exceptions wherever return codes are handled by bubbling them up. If the C code handles errors by checking for special error codes then returning the error code to its caller, and so on, bubbling the error code up the call chain, then that C code is probably a candidate for using exceptions instead. This conversion is actually trivial. Simply throw the return code (C++ allows you to throw any type you want) at the lowest level. Insert a try{} catch(){} statement at the place in the code which handles the error. If no suitable place exists to handle the error, consider wrapping the body of main() in a try{} catch(){} statement and logging it.

Now step back and look how much you've improved the code, without converting anything to classes. (Yes, yes, technically, your structs are classes already.) But you haven't scratched the surface of OO, yet managed to greatly simplify and solidify the original C code.

Should you convert the code to use classes, with polymorphism and an inheritence graph? I say no. The C code probably does not have an overall design which lends itself to an OO model. Notice that the goal of each step above has nothing to do with injecting OO principles into your C code. The goal was to improve the existing code by enforcing as many compile-time constraints as possible, and by eliminating or simplifying the code.

One final step.

Consider adding benchmarks so you can show them to your boss when you're done. Not just performance benchmarks. Compare lines of code, memory usage, number of functions, etc.

John
@John. love your first point: this is a must have before anyone takes on a porting assignment.
Syd
Great advice, all except #6. There's absolutely no compile-time check for validity with a reference. Const pointers (as opposed to pointer-to-const) are often better, since all the lifetime and aliasing issues are more apparent.
Ben Voigt
+1 for a thoughtful, helpful answer, backed by experience.
djna
Stunning answer
Dominic Bou-Samra
+1... I'm stunned, too... O.o... And apparently, it's John's first post on SO...
paercebal
A: 

I read this article which is titled "Make bad code good" from http://www.javaworld.com/javaworld/jw-03-2001/jw-0323-badcode.html?page=7 . Its directed at Java users, but all of its ideas our pretty applicable to your case I think. Though the title makes it sound likes it is only for bad code, I think the article is for maintenance engineers in general.

To summarize Dr. Farrell's ideas, he says:

  1. Start with the easy things.
  2. Fix the comments
  3. Fix the formatting
  4. Follow project conventions
  5. Write automated tests
  6. Break up big files/functions
  7. Rewrite code you don't understand

I think after following everyone else's advice this might be a good article to read when you have some free time.

Good luck!

gnucom