views:

123

answers:

5

I wrote the following heap debugger in order to demonstrate memory leaks, double deletes and wrong forms of deletes (i.e. trying to delete an array with delete p instead of delete[] p) to beginning programmers.

I would love to get some feedback on that from strong C++ programmers because I have never done this before and I'm sure I've done some stupid mistakes. Thanks!

#include <cstdlib>
#include <iostream>
#include <new>

namespace
{
    const int ALIGNMENT = 16;
    const char* const ERR = "*** ERROR: ";
    int counter = 0;

    struct heap_debugger
    {
        heap_debugger()
        {
            std::cerr << "*** heap debugger started\n";
        }

        ~heap_debugger()
        {
            std::cerr << "*** heap debugger shutting down\n";
            if (counter > 0)
            {
                std::cerr << ERR << "failed to release memory " << counter << " times\n";
            }
            else if (counter < 0)
            {
                std::cerr << ERR << (-counter) << " double deletes detected\n";
            }
        }
    } instance;

    void* allocate(size_t size, const char* kind_of_memory, size_t token) throw (std::bad_alloc)
    {
        void* raw = malloc(size + ALIGNMENT);
        if (raw == 0) throw std::bad_alloc();

        *static_cast<size_t*>(raw) = token;
        void* payload = static_cast<char*>(raw) + ALIGNMENT;

        ++counter;
        std::cerr << "*** allocated " << kind_of_memory << " at " << payload << " (" << size << " bytes)\n";
        return payload;
    }

    void release(void* payload, const char* kind_of_memory, size_t correct_token, size_t wrong_token) throw ()
    {
        if (payload == 0) return;

        std::cerr << "*** releasing " << kind_of_memory << " at " << payload << '\n';
        --counter;

        void* raw = static_cast<char*>(payload) - ALIGNMENT;
        size_t* token = static_cast<size_t*>(raw);

        if (*token == correct_token)
        {
            *token = 0xDEADBEEF;
            free(raw);
        }
        else if (*token == wrong_token)
        {
            *token = 0x177E6A7;
            std::cerr << ERR << "wrong form of delete\n";
        }
        else
        {
            std::cerr << ERR << "double delete\n";
        }
    }
}

void* operator new(size_t size) throw (std::bad_alloc)
{
    return allocate(size, "non-array memory", 0x5AFE6A8D);
}

void* operator new[](size_t size) throw (std::bad_alloc)
{
    return allocate(size, "    array memory", 0x5AFE6A8E);
}

void operator delete(void* payload) throw ()
{
    release(payload, "non-array memory", 0x5AFE6A8D, 0x5AFE6A8E);
}

void operator delete[](void* payload) throw ()
{
    release(payload, "    array memory", 0x5AFE6A8E, 0x5AFE6A8D);
}
+1  A: 
void* raw = static_cast<char*>(payload) - ALIGNMENT;

If payload has been already deleted, wouldn't that make this undefined behavior?

UncleBens
To be honest, I don't know. I hope that when a pointer had been handed back by `new` at some point in the past, using it again won't cause too many demons to fly out of my nose. But I'm not sure.
FredOverflow
I mean, I don't see how you can hope to check whether you are given a valid pointer if you completely rely on the pointer being valid to begin with (or else nasal demons *will* fly). I think you might want to reconsider bitc's answer...
UncleBens
+1  A: 

I'm not following your use of hardcoded constants/constant strings very well - put them in an enum? And I'm really not getting the idea of the token quite well. Needs more comments

Paul Nathan
Ah, you mean the tokens and the "array/non-array" stuff. Yeah, that makes sense. Thanks.
FredOverflow
+1  A: 

That's a really great start. Here's my 2 cents as you've asked for feedback:

  1. The code writes trace information to cerr, which is really for errors. Use cout for informational logs.
  2. The ALIGNMENT amount is arbitrary. If code tried to allocate 4090 bytes, you'd allocate 4106, which spills into the next 4k block, which is the size of a memory page. A calculated alignment value would be better... or rename ALIGNMENT as HEADER_SIZE or something similar.
  3. Given the header you're creating, you could store the size and a flag for the 'kind of memory' at allocation time and compare it at release time.
  4. Token should probably be called 'sentinel' or 'canary value'.
  5. Why is Token a size_t? Why not just a void * ?
  6. Your check for null in release should probably throw an exception instead - wouldn't it be an error if the code deleted a null pointer?
  7. Your 'correct_token' and 'wrong_token' values are far too similar. I had to re-read the code to be sure.
  8. Given point (3), you could double the amount you additionally allocate and have before and after sentinels/guard blocks. This would detect memory underrun and overrun.
JBRWilkinson
It's safe to delete a null-pointer.
bitc
2. How do I calculate a value that works for every type? | 5. I don't see why a fixed integral value should be a `void*`, please explain. | 6. No, delete 0 must be a noop. Throwing an exception would violate the delete contract. | 7. Yes, thanks for that observation. | 8. Although not my original aim, this is probably still a good idea to implement. Thanks!
FredOverflow
1. Or perhaps `clog`? I think the point is to be able to redirect the log output.
UncleBens
@FredOverflow: (2) something based on sizeof(type), or just do some math to work out a sensible alignment value, along the lines of size += (ALIGMENT - (size % ALIGNMENT)) + (HEADER_SIZE * 2)? (5) I can't think of any reason other than tradition :-|
JBRWilkinson
How about placing the meta-data at the end of the block (aligned to a 4 or 8 byte boundary)?
UncleBens
+1  A: 

Explain why you chose "ALIGNMENT" as an identifier. Explain why you picked 16. Argue how your algorithm catches the most common mistakes, like overflowing the end of a heap-allocated block or forgetting to release memory.

Hans Passant
Do you mean "explain to you" or "add comments to the code"? I chose "ALIGNMENT" because memory allocations for arbitrary objects have to be aligned by some number that works for every type. I picked 16 because I don't know of a portable way to determine the real value, so better suggestions are welcome :) Buffer overflow is not covered by my heap debugger at all. Forgetting to release memory is covered by checking a counter which is incremented with each executed `new` and decremented with each executed `delete`, see the code in `~heap_debugger`.
FredOverflow
Nah, don't explain it to me, explain it to the user of your code.
Hans Passant
+2  A: 

Instead of doing intrusive note-keeping you could keep a list of all allocations made. Then you can free the memory without destroying your own data, and keep track of how many times a particular address is "deleted", and also find places where the program tries to delete a non-matching address (i.e. not in the list).

bitc
How exactly do I make sure that the note-keeping list does not keep track of its own memory? Write an allocator? Write my own list implementation based on malloc and free?
FredOverflow
@bitc My first implementation of a non-intrusive heap debugger can be found [here](http://stackoverflow.com/questions/2835416/critique-my-non-intrusive-heap-debugger).
FredOverflow