views:

1101

answers:

18

I'm working on a project that is supposed to be used from the command line with the following syntax:

program-name input-file

The program is supposed to process the input, compute some stuff and spit out results on stdout.

My language of choice is C++ for several reasons I'm not willing to debate. The computation phase will be highly symbolic (think compiler) and will use pretty complex dynamically allocated data structures. In particular, it's not amenable to RAII style programming.

I'm wondering if it is acceptable to forget about freeing memory, given that I expect the entire computation to consume less than the available memory and that the OS is free to reclaim all the memory in one step after the program finishes (assume program terminates in seconds). What are your feeling about this?

As a backup plan, if ever my project will require to run as a server or interactively, I figured that I can always refit a garbage collector into the source code. Does anyone have experience using garbage collectors for C++? Do they work well?

+8  A: 

Not deallocating memory should not be problem but it is a bad practice.

Pierre
If it didn't create any problems why would it be considered bad practice?
Konrad Rudolph
Can you elaborate?
the program will release the memory once it exits. But as a programmer managing the memory should become a habit. When I see someone's code I have a bad feeling when I see that the memory was allocated (ok, he checked malloc != null) but it wasn't managed. How can I build a library with it ?
Pierre
@Konrad-Rudolph, to your point of "if it didn't create any problems then why is it bad" I understand that. However, what if later you need good memory usage or a library as Pierre suggested? Much harder to bolt it on later. But I agree with the tautology "if it does nothing bad, it is not bad" view.
BobbyShaftoe
+15  A: 

It shouldn't cause any problems.

However, it's not exactly normal. Static analysis tools will complain about it. Most importantly, it builds bad habits.

Joel Coehoorn
The argument that it builds bad habits is the strongest I have seen against not de-allocating.
If this is for, say, a competition, and no one's suppose to use it afterward, sure...
Calyth
@stefan.ciobaca: Retrofitting GC will just be a nightmare. Either you do it first or not at all.
Martin York
Appeasing static analysis tools can be one of the costs of using a language without a GC. It can often cause pointless pagefaulting and slow down to go through all memory at the end of a process's live just to free memory.
Luke Quinane
+2  A: 

What are your feeling about this?

Some O/Ses might not reclaim the memory, but I guess you're not intenting to run on those O/Ses.

As a backup plan, if ever my project will require to run as a server or interactively, I figured that I can always refit a garbage collector into the source code.

Instead, I figure you can spawn a child process to do the dirty work, grab the output from the child process, let the child process die as soon as possible after that and then expect the O/S to do the garbage collection.

ChrisW
Yeap, the program is supposed to be run on a reasonably modern computer with a reasonably modern OS. Interesting approach to GC, but you lose data other than garbage. Have you ever done it in practice?
Sorry, I don't understand your 2nd and 3rd sentences: what data other than garbage is lost, in what situation?
ChrisW
In your example with the fork, you may want to move part of the data structures (the useful part) from the child process to the father.
I see. No I'd just return serialized results (not in-memory data) from the child to the parent: I'd do that either in a data file, or by the parent's having redirected stdout. The use case is that the child is sloppy with its own memory: which all needs to be recovered, and which shouldn't ...
ChrisW
... be allowed to run the risk of corrupting the memory in the parent process.
ChrisW
+2  A: 

I have not personally used this, but since you are starting from scratch you may wish to consider the Boehm-Demers-Weiser conservative garbage collector

rschuler
I looks like a good starting point for GC in C (and a bit C++).
+8  A: 

Joel Coehoorn is right:

It shouldn't cause any problems.

However, it's not exactly normal. Static analysis tools will complain about it. Most importantly, it builds bad habits.

I'd also like to add that thinking about deallocation as you write the code is probably a lot easier than trying to retrofit it afterwards. So I would probably make it deallocate memory; you don't know how your program might be used in future.

If you want a really simple way to free memory, look at the "pools" concept that Apache uses.

user9876
I agree that it is easier to think about it upfront. My question stated I think that even this is difficult and any non-trivial modification (which from experience occur during the coding/design phase a lot) might lead to unknown memory leaks.
Thanks for the link. I'll take a look.
Interesting link. If I understand correctly, it's what I'm doing at the moment, except that my pool is actually typed.
A: 

Apart from the fact that the OS (kernel and/or C/C++ library) can choose not to free the memory when the execution ends, your application should always provide proper freeing of allocated memory as a good practice. Why? Suppose you decide to extend that application or reuse the code; you'll quickly get in trouble if the code you had previously written hogs up the memory unnecessarily, after finishing its job. It's a recipe for memory leaks.

Eduard - Gabriel Munteanu
What OS that you are using doesn't reclaim all memory used by a user-mode program?
The entry point to the application is what is exposed through the command line. Anything other will require knowing the internals of the code.
+12  A: 

My feeling would be something like "WTF!!!"

Look at it this way:

  • You choose a programming language that does not include a garbage collector, we are not allowed to ask why.

  • You are basically stating that you are too lazy to care about freeing the memory.

Well, WTF again. Laziness isn't a good reason for anything, the least of what is playing around with memory without freeing it.

Just free the memory, it's a bad practice, the scenario may change and then can be a million reasons you can need that memory freed and the only reason for not doing it is laziness, don't get bad habits, and get used to do things right, that way you'll tend to do them right in the future!!

Jorge Córdoba
1. Correct2. No; I'm stating that it's (very?) difficult to free the memory at the right time and do it right."Just free the memory"See above.
Of course is difficult, and because of that you're trying to avoid it. You don't want to put effort in it, which is the definition of laziness (don't take this personally, it's just that IT IS a very bad way of seeing things)
Jorge Córdoba
+3  A: 

will use pretty complex dynamically allocated data structures. In particular, it's not amenable to RAII style programming.

I'm almost sure that's an excuse for lazy programming. Why can't you use RAII? Is it because you don't want to keep track of your allocations, there's no pointer to them that you keep? If so, how do you expect to use the allocated memory - there's always a pointer to it that contains some data.

Is it because you don't know when it should be released? Leave the memory in RAII objects, each one referenced by something, and they'll all trickle-down free each other when the containing object gets freed - this is particularly important if you want to run it as a server one day, each iteration of the server effective runs a 'master' object that holds all others so you can just delete it and all the memory disappears. It also helps prevent you retro-fitting a GC.

Is it because all your memory is allocated and kept in-use all the time, and only freed at the end? If so see above.

If you really, really cannot think of a design where you cannot leak memory, at least have the decency to use a private heap. Destroy that heap before you quit and you'll have a better design already, if a little 'hacky'.

There are instances where memory leaks are ok - static variables, globally initialised data, things like that. These aren't generally large though.

gbjbaanb
Yes, I admit I'm lazy when programming and I'm proud of it. Without going into the details, what I'm doing resembles data-structure-wise to a compiler.
And I am actually keeping track of all the objects in a "private heap", as you call it (I call it an object pool).
by "private heap" I meant a private heap - eg HeapCreate in Windows. Laziness just means you end up doing more work than if you'd done it properly.
gbjbaanb
+3  A: 

Well, I think that it's not acceptable. You've already alluded to potential future problems yourself. Don't think they're necessarily easy to solve.

Things like “… given that I expect the entire computation to consume less …” are famous last phrases. Similarly, refitting code with some feature is one of these things they all talk of and never do.

Not deallocating memory might sound good in the short run but can potentially create a huge load of problems in the long run. Personally, I just don't think that's worth it.

There are two strategies. Either you build in the GC design from the very beginning. It's more work but it will pay off. For a lot of small objects it might pay to use a pool allocator and just keep track of the memory pool. That way, you can keep track of the memory consumption and simply avoid a lot of problems that similar code, but without allocation pool, would create.

Or you use smart pointers throughout the program from the beginning. I actually prefer this method even though it clutters the code. One solution is to rely heavily on templates, which takes out a lot of redundancy when referring to types.

Take a look at projects such as WebKit. Their computation phase resembles yours since they build parse trees for HTML. They use smart pointers throughout their program.

Finally: “It’s a question of style … Sloppy work tends to be habit-forming.” – Silk in Castle of Wizardry by David Eddings.

Konrad Rudolph
I do not understand what is the big trouble about going from no GC to GC. Can you give further details?
How do templates remove redundancy when referring to types? Examples?
I will take a look at webkit, out of curiosity. Its rendering phase may actually resemble what I do in some vague way. However, I see big differences. They may need to render multiple times the same thing, and cache things between runs for performance reasons.
+3  A: 

Reference counting smart pointers like shared_ptr in boost and TR1 could also help you manage your memory in a simple manner.

The drawback is that you have to wrap every pointers that use these objects.

total
Not pleasant to have templates all over the code.
The difference between "MyClass * pointer = new MyClass" and "shared_ptr<MyClass> pointer(new MyClass)" doesn't rate as not pleasant to me.
Mark Ransom
Reference counting is nice, but for complex structures you need something to break circular references.
Constantin
@Constantin: That's what weak_ptr is for. And concerning the increased typing effort, what about typedefs? If it's your own class, add something like "typedef shared_ptr<MyClass> spMyClass" to each header file. If you cannot change the header file, define it in a strategic place.
mxp
+1  A: 

The answer really depends on how large your program will be and what performance characteristics it needs to exhibit. If you never deallocate memory, your process's memory footprint will be much larger than it would otherwise be. Depeding on the system, this could cause a lot of paging and slow down the performance for you or other applications on the system.

Beyond that, what everyone above says is correct. It probably won't cause harm in the short term, but it's a bad practice that you should avoid. You'll never be able to use the code again. Trying to retrofit a GC on afterwards will be a nightmare. Just think about going to each place you allocate memory and trying to retrofit it but not break anything.

One more reason to avoid doing this: reputation. If you fail to deallocate, everyone who maintains the code will curse your name and your rep in the company will take a hit. "Can you believe how dumb he was? Look at this code."

Steve Rowe
It's not supposed to be large and I'm sure not de-allocating will actually improve performance.
As for the reputation, I'm planing to make an open source realease :D
A: 

In general, I agree it's a bad practice.

For a one shot program, it can be OK, but it kinda shows like you don't what you are doing.

There is one solution to your problem though - use a custom allocator, which preallocates larger blocks from malloc, and then, after the computation phase, instead of freeing all the little blocks from you custom allocator, just release the larger preallocated blocks of memory. Then you don't need to keep track of all objects you need to deallocate and when. One guy who wrote a compiler too explained this approach many years ago to me, so if it worked for him, it will probably work for you as well.

J S
I'm using an object pool at the moment. Who is the guy and what did you do to him? (i.e. what did he explain to you)
The guy was a schoolmate at the university (in 1997), and he was proud of his compiler, so he was explaining. He didn't say much more than that. It was a proprietary Pascal-like language though, so it's probably dead now.
J S
+2  A: 

I've done this before, only to find that, much later, I needed the program to be able to process several inputs without separate commands, or that the guts of the program were so useful that they needed to be turned into a library routine that could be called many times from within another program that was not expected to terminate. It was much harder to go back later and re-engineer the program than it would have been to make it leak-less from the start.

So, while it's technically safe as you've described the requirements, I advise against the practice since it's likely that your requirements may someday change.

Larry Gritz
I understand the reasoning. If and when my program turns into a library, the interface to this library would pretty much consist of a program invocation. Would this relieve the pain?
A: 

Try to use automatic variables in methods so that they will be freed automatically from the stack.

The only useful reason to not free heap memory is to save a tiny amount of computational power used in the free() method. You might loose any advantage if page faults become an issue due to large virtual memory needs with small physical memory resources. Some factors to consider are:

If you are allocating a few huge chunks of memory or many small chunks.

Is the memory going to need to be locked into physical memory.

Are you absolutely positive the code and memory needed will fit into 2GB, for a Win32 system, including memory holes and padding.

jeffD
I thought the reason to not free heap memory at exit is exactly to avoid paging-in the blocks being freed.
Constantin
Can you give further details of your arguments please? For example, can you explain how having many small chunks affects stuff? Same for the locking, etc?
Constantin, depending on the heap management, you probably wouldn't need to page-in heap memory to free it. You would just adjust or mark in some allocated-free data structure that that memory is now free.
jeffD
If the memory needs to be locked into physical memory, then you should always try to free it as soon as it's not needed anymore.
jeffD
A: 

That's generally a bad idea. You might encounter some cases where the program will try to consume more memory than it's available. Plus you risk being unable to start several copies of the program.

You can still do this if you don't care of the mentioned issues.

sharptooth
+2  A: 

If the run time of your program is very short, it should not be a problem. However, being too lazy to free what you allocate and losing track of what you allocate are two entirely different things. If you have simply lost track, its time to ask yourself if you actually know what your code is doing to a computer.

If you are just in a hurry or lazy and the life of your program is small in relation to what it actually allocates (i.e. allocating 10 MB per second is not small if running for 30 seconds) .. then you should be OK.

The only 'noble' argument regarding freeing allocated memory sets in when a program exits .. should one free everything to keep valgrind from complaining about leaks, or just let the OS do it? That entirely depends on the OS and if your code might become a library and not a short running executable.

Leaks during run time are generally bad, unless you know your program will run in a short amount of time and not cause other programs far more important than your's as far as the OS is concerned to skid to dirty paging.

Tim Post
A: 

If it is non-trivial for you to determine where to deallocate the memory, I would be concerned that other aspects of the data structure manipulation may not be fully understood either.

Kim Reece
I understand the manipulations quite well.
+5  A: 

Sometimes not deallocating memory is the right thing to do.

I used to write compilers. After building the parse tree and traversing it to write the intermediate code, we would simply just exit. Deallocating the tree would have

  • added a bit of slowness to the compiler, which we wanted of course to be as fast as possible.
  • taken up code space
  • taken time to code and test the deallocators
  • violated the "no code executes better than 'no code'" dictum.

HTH! FWIW, this was "back in the day" when memory was non-virtual and minimal, the boxes were much slower, and the first two were non-trivial considerations.

Mark Harrison