views:

668

answers:

14

Hi guys,

I've been working at my university this summer in an image/video lab. Just recently, my professor gave me a program written by a grad student who just left the program to "fix up", because it was "giving some errors."

The project was written in C++ (seems to be a recurring bad sign in student code). I opened the project in VS08, and ran the project, and turns out, the "errors" was a bad_alloc. Sure enough, the memory management, or more precisely, the lack of it, was the problem.

The programmer seemed to like mingling mallocs, news and new[]s throughout the entire code, with absolutely no free, delete or delete[]. To make it worse, all the objects seem to do atleast 4-5 unrelated things. And to top it off, here's a comment left by the programmer:

//do not delete objects, it seems to cause bugs in the segmenter

From what I can see, there's a nice unhealthy mix of reference of pointers and references, and all values are changed by passing by reference to the monolithic class functions that may as well be static. At compile time, there were around 23 warnings---stuff like possible loss of data when converting from double to char, around 17 unused variables etc. It's times like this that I wish C++ never existed in universities, and that all lab work was done in like python or matlab...

So now, the professor wants me to "fiddle" with the program so it can run on datasets around 10 times larger than what it was used to. I admit, I'm a bit afraid of telling her the code is garbage.

StackOverflow, you guys have never failed before with giving good advice, so now I plead, any advice on dealing with situations like this would be MUCH appreciated.

Thanks in advance :(

EDIT The code is around 5000 LoC

EDIT2 The professor decided to go with the easiest approach. Which was getting more RAM. Yay for being about to throw money at the problem...

+3  A: 

In my opinion the best thing you can do is tell the professor that the code is a mess, and will not work as expected unless some parts are rewritten. Suggest a small code refactoring, rewriting only the parts that need it (not the entire program).

Not telling this, can give you more problems than you already have :)

Matias Valdenegro
+15  A: 

Refactor the code, one change at a time, until it makes sense to you. The most important thing is not the exact coding strategy, but that you understand what it's doing and why.

You're probably going to end up touching every line of code, so don't try to do things in any particular order - fix the first thing that jumps out at you as wrong first, then go on to the next one, and so on. Exception: if the previous person didn't use a consistent code formatting strategy, run the whole thing through an autoindenter as your first action.

Write a test suite as you go.

Zack
Thanks Zack for giving me confidence to do this in one of my own projects. +1 in particular for "The most important thing is not the exact coding strategy, but that you understand what it's doing and why"
fearoffours
As a code indenter: Artistic Style is free and quite easy to use: http://astyle.sourceforge.net/
Matthieu M.
+22  A: 

First, bad code is bad code. Python, Java, Matlab or anything else garbage collected doesn't == good code. You could just as easily be spending your time debugging bad Python code.

With that said, be absolutely upfront with the professor. Tell her the code is bad and show her some examples. The worst thing you can do is to try to hide the problem. If you do that, the problem is not only sure to land in your lap, but it's also sure to be blamed on you.

Figure out what the best solution is and propose it to her. It might be fixing that code, enlisting the help of the computer science department, or rewriting it, either in C++ or another language. Chances are good that she doesn't have any other options and will gladly accept whatever solution you propose.

Wade Williams
And I agree with Zack - add unit tests to the code as you change it so you know your changes produce the desired result and will continue to do so.
Wade Williams
Thanks guys for the advice, I'm emailing the professor right now (and trying to be gentle about it). I have to say though, it's much easier to write horribly buggy code in C++ as a beginner than any other language... I'll update this question soon
Xzhsh
@Xzhsh: C++ is a very expressive and powerful language, enabling you to do, rapidly and accurately, things that really, really should never be done.
David Thornley
I disagree. Professors typically don't care if code is bad or good only whether it works or not. Step one, just fix up the memory leaks. Step two, redesign the code from scratch. It is only 5000 lines you should be able to do that in a weekend.
ceretullis
@David, Haha, agreed. And I'm fairly sure that the professor is just going to want stuff to work since she's really not even a CS professor, and I'm just a undergrad doing this for a summer job, but I guess it never hurts to try. I'm waiting on a response right now
Xzhsh
Well updated question. Issue is resolved, although I'm not sure in the best way haha. Thanks for the advice guys
Xzhsh
+1  A: 

As a stop gap, you could try using some sort of garbage-collected memory management implmementation like dlmalloc to see if that allows you to temporarily get over the out of memory hurdle.

In the longer run, you will have to sort out the memory management mess - there is simply no way around that. From your description, you'll probably also have to address the object design issues, but you can probably get away with doing some memory management cleanup first.

Here's the way I'd approach this:

  • Find all places in the code that call malloc(). Have a very good look at them and try to replace them with calls to new so you're only having to deal with a single type of memory management, which will make it easier to go to the next step.
  • Find all the places that new is called and see if you can replace the raw pointers that the results of new are assigned to with boost::shared_ptr<>. This should at least give you some sort of poor man's garbage collection; of course you should also change all the function prototypes of the functions receiving these pointers to take shared_ptr<> instead of raw pointers, otherwise you've just dressed up the mess and have not really improved anything. I'd actually argue that you've changed things for worse...

Once you've addressed the immediate memory management issues, you can start refactoring the objects and improve the design. I wouldn't start by fixing the design first, you're better off getting the software working properly, gaining more understanding of its workings and then sort out the design, otherwise you're likely to replace one mess with another one.

Timo Geusch
+1  A: 

Sounds like a complete mess. A refactoring is the least you'll need to do. The mixture of new's and malloc's is a recipe for disaster!

I think you'll have to probably rewrite almost the whole thing. Luckily 5000 SLOC isn't huge and shouldn't take too long.

Sounds like a good exercise!

Starkey
+6  A: 

I would suggest following some of the steps that Michael Feathers outlines in his 'Working Effectively with Legacy Code' book. Namely: Get the code under test as soon as possible.

Having unit tests that exercise the functionality will give you something you can freely refactor without fear.

I know, however, that it's much easier to say this, than to actually DO it. Read this chapter on Seams (parts of code you can override/hook to allow you to get code under test easier): http://www.informit.com/articles/article.aspx?p=359417&amp;seqNum=3 See also this on CppUnit and CppUnitLite, a framework for unit testing in C++: http://c2.com/cgi/wiki?CppUnitLite

Run the code through a memory profiler. See this SO link: http://stackoverflow.com/questions/818673/memory-profiler-for-c This will help you track down the places you need to start putting delete/free statements. I would also start trying to make the memory allocation at least consistent in the API it uses.

Kilanash
+10  A: 

Be honest. Tell your professor immediately that the code is crap and why so. If what you listed in your question is true, the code is crap indeed.

How much time do you have at hand? Do you understand the implemented algorithm(s)?
5kLoC isn't all that much, especially when it's all low-level fiddling. When you know the underlying algorithms and have enough time at hand, re-writing it to 2k lines of easily understandable code might be better than trying to fix it.

sbi
I agree with the rewriting approach. When code is that bad, it may be easier to keep the original as is and rewrite a functionally equivalent piece of code. Makes for easier testing too since you can check the consistency of the results between the two versions.
Matthieu M.
+1  A: 

Explain to the professor the issue that you see. And never present a problem without a solution. Keep that in mind as you write up your report along with your suggestions. You can offer multiple possibilities and let them choose the one that they wish to take. And at that point you will have done your job -- time for them to do theirs!

NinjaCat
+6  A: 

5000 LoC isn't too bad. Count yourself lucky.

Since it sounds like memory management is one of the biggest issues, I would start there.

  1. Replace every use of malloc with new.
  2. Fix compiler warnings.
  3. Replace arrays with vectors (or more appropriate data structures).
  4. Replace raw pointers with stack variables/references when possible, and smart pointers when not. Don't worry about major architecture changes or a 100% conversion -- focus on the low hanging fruit and clean up some of the major memory leaks.
  5. Begin rearchitecting the application.
    1. Pick a discrete behavior/task in the application.
    2. Figure out roughly how it does work.
    3. Figure out how you want it to work (concentrating on the interface).
    4. Develop a transition plan.
    5. Execute your plan.
    6. Repeat.
Steve S
+1  A: 

Many good suggestions on how to deal with the code itself have been given. It sounds to me like one of the underlying problems may be that this was (understandably) written for a non-computer class by a non-programmer. You might suggest to your prof that in the future she get someone in the programmer stream classes (in cooperation with a cs prof) to do the actual implementation in these cases.

Now, of course this has an upside and a down side for everyone involved. The end user (your prof) has to be able to create a spec or at least articulate her needs. She may need to be convinced that there is value in taking the time away from what she would rather be doing in order to do this. The implementation team has to be able to work to that spec and communicate with the end user(s). The implementation team will ensure that the code is reasonably well written if they are in fact a team rather than an individual.

This is all excellent preparation for the 'real world' as well as an opportunity for interdisciplinary cooperation, with those who understand the problem domain working with those who understand the tools.

It offers the cs student(s) a 'real' rather than a contrived exercise, and should generate more enthusiasm on their part. (Of course, depending on departmental rivalry and small-p politics, it could also be a boondoggle in the making or a non-starter, but hey, I'm an optimist...)

mickeyf
This doesn't answer the question... knowing how it should have been done right, doesn't help that much :-)
mxp
Absolutely correct. It does not answer the question, and I apologize. However, I think being able to learn from the mistakes of ourselves or others and to see beyond the immediate problem does help. Simply answering it (as many have done better than I could) might have solved it once, 'knowing how to do it right' (if that's what my humble suggestion is ;>) might prevent it from recurring multiple times in the future. It's still not too late to make it a cross department team effort that could benefit a lot of people, rather than a painful solo effort that helps only the original end user.
mickeyf
A: 

Keys to maintenance:

  • SCM. On Windows I would recommend something like Mercurial. Make central blessed repository where from others can get the software and where to fixes get committed from developer private repositories.

  • Issue/bug tracking. If you do not have one yet, get it. That should be an easy sell to the prof. Can't give any recommendation for Windows. Friends are using Mantis.

  • Release management. Often part of the issue tracking system. But the process of making a release itself is more of the organizational and political, not technical, issue. Essentially it is about giving public name to a certain internal versions of the software. In real life it gets highly political when it is being decided what actually gets into the release.

5K LoC isn't much. Having SCM helps to track the changes and go back in case of regressions. Having issue tracking help the team to collaborate on problems. Make minor releases when you fix problems (e.g. your malloc vs. new vs. memory leaks). Make major releases for the bigger fixes and features (e.g. support for larger data sets).

Important is to not to rush and start with small changes. Modern SCM allow for lots of small commits (ditto issue tracking with issue dependency trees) and one should use that. That allows to track how behavior of the software changes and what change precisely introduced the regression. Quite often it is the most innocent looking change.

Dummy00001
+1  A: 

There's nothing to be gained by hiding the problem from the prof. Yes, complaining that you were given a load of junk to work on could sound like whining, but hiding the fact and trying to fix it on the side will make it look like you are at best a slow worker and at worst incompetent. I think the first thing to do is to try to politely tell the prof that there are a lot of problems with this program that will need to be cleaned up and give examples ... does the prof know anything about programming? I don't think that was spelled out in the question ... but one way or another explain the problem and say what you think it will take to fix it.

That said, the next big question that come to my mind is, Are the poor memory management and the bad type conversions the only big issues, or are those just examples you picked from a hundred serious problems? That is, is the basic structure and logic of the program basically sound, or is the whole thing a mess? (I'd guess it's all a mess, but obviously I haven't seen the program.)

The key decision to be made here is, Do you (a) Go through the program and fix a bunch of detail errors one at a time? (b) Do a serious restructuring and cleanup? Or (c) Throw it away and rewrite from scratch. I'm often tempted to choice (c) in cases like this because it seems like rewriting would be less work than cleaning up the mess, but it is likely that there are many many detailed logic decisions embedded in the existing code. Unless you have detailed, up-to-date requirements documentation -- which is such an unlikely possibility that I will dismiss it instantly amidst gales of laughter at the very thought -- the only way to know all the rules is to read the existing code and figure out what it's up to. At which point (a) or (b) become much more efficient than (c).

Wish I could say "just do X", unfortunately with the amount of information that you can reasonably put into a post, I think we're all limited to saying "Here are some plausible options to consider ..."

Jay
A: 

This problem could easily be fixed by adding more RAM. 1-2GB should do the trick.

moreram
A: 

Replace every:

Type* data = (Type*)malloc(123*sizeof(Type));

with

std::vector<Type> data(123);

All memory management removed! Easy as pie. Then just pass around the vectors by value.

Inverse
Ah, if only this sort of problem were that simple ;-)
Zack