tags:

views:

96

answers:

5

I am building a suffix trie (unfortunately, no time to properly implement a suffix tree) for a 10 character set. The strings I wish to parse are going to be rather long (up to 1M characters). The tree is constructed without any problems, however, I run into some when I try to free the memory after being done with it.

In particularly, if I set up my constructor and destructor to be as such (where CNode.child is a pointer to an array of 10 pointers to other CNodes, and count is a simple unsigned int):

CNode::CNode(){
    count = 0;
    child = new CNode* [10];
    memset(child, 0, sizeof(CNode*) * 10);
}

CNode::~CNode(){
    for (int i=0; i<10; i++)
        delete child[i];
}

I get a stack overflow when trying to delete the root node. I might be wrong, but I am fairly certain that this is due to too many destructor calls (each destructor calls up to 10 other destructors). I know this is suboptimal both space, and time-wise, however, this is supposed to be a quick-and-dirty solution to a the repeated substring problem.

tl;dr: how would one go about freeing the memory occupied by a very deep tree?

Thank you for your time.

A: 

Do you initialize the memory for the nodes themselves? From what I can see, your code only allocates memory for the pointers, not the actual nodes.

As far as your question goes, try to iterate over the tree in an iterative manner, not recursively. Recursion is bad, it's nice only when it's on the paper, not in the code, unfortunately.

Semen Semenych
Recursion isn't bad... =[
strager
I do, although in a different place. It did not seem relevant here. I might try this approach, however I was hoping there was a more elegant solution :)
Kathoz
@Kathoz : Really this is the simplest approach available. Actually, a recursive implementation is always limited by the stack size, so you may easily run into problems with a recursive approach.
Semen Semenych
Saying that recursion is bad is like saying that high level languages are bad (or some other generic statement like that). It can be bad, but it depends on the context and I wouldn't advice anyone to stay away from it altogether. Maybe you meant to say that it's bad in C++, which would probably be more valid.. (even if I'm not sure if I'd agree)
Jakob
@Jakob : Agreed, my statement was a bit too harsh, but still, we're limited by the stack size, aren't we? A recursive implementation is a possible source of bugs (well, what's not), and in the tree/graph case, I'd certainly recommend to stay away from recursion, paying with a bit of extra memory necessary to store the flag variables necessary for the iterative implementation.
Semen Semenych
A: 

Have you considered just increasing your stack size?

In visual studio you do it with /FNUMBER where NUMBER is stack size in bytes. You might also need to specify /STACK:reserve[,commit].

George
The OP says the data set may be 1 million deep... That's at least 4MB of stack on modern systems. Add to that parameters, temporaries, etc. and you'll have a very large stack.
strager
@strager: You say that as if 4 megabytes is a lot. On an embedded system it certainly can be, but if you're talking a modern windows or linux desktop or server system, 4 megabytes is less than peanuts. an 8 or 16 MB stack would be no problem.
George
We use a 8MB stack PER thread (18) where I work... server programmation really is different from embedded :D
Matthieu M.
+1  A: 

One option is to allocate from a large buffer then deallocate that buffer all at once.

For example (untested):

class CNodeBuffer {
    private:
        std::vector<CNode *> nodes;

    public:
        ~CNodeBuffer() {
            empty();
        }

        CNode *get(...) {
            CNode *node = new CNode(...);
            nodes.push_back(node);
            return node;
        }

        void empty() {
            for(std::vector<CNode *>::iterator *i = nodes.begin(); i != nodes.end(); ++i) {
                delete *i;
            }

            nodes = std::vector<CNode *>();
        }
};

If pointers to a std::vector's elements are stable, you can make things a bit simplier and just use a std::vector<CNode>. This requires testing.

strager
Pointers to vector are not stable. One can use vector's indexes instead. I generally implement what you exemplify above and use indexes instead of pointers (probably one can implement custom iterators around them?). Though it generally gets bit messier when one need to also keep the list of free nodes to support put() (or delete()) operation.
Dummy00001
@Dummy00001, A "put" or individual delete would basically be a noop, or the structure could be reused (e.g. recalling the ctor). I think I've seen this pattern before (more generalized), but I forgot what it's called. A custom iterator sounds like a good idea, though feels kinda heavy for what's needed.
strager
A: 

You're going to do quite a few deletes. That will take a lot of time, because you will access memory in a very haphazard way. However, at that point you don't need the tree structure anymore. Hence, I would make two passes. In the first pass, create a std::vector<CNode*>, and reserve() enough space for all nodes in your tree. Now recurse over the tree and copy all CNode*'s to your vector. In the second step, sort them (!). Then, in the third step, delete all of them. The second step is technically optional but likely makes the third step a lot faster. If not, try sorting in reverse order.

MSalters
A: 

I think in this case a breadth-first cleanup might help, by putting all the back-tracking information into a deque rather than on the OS provided stack. It still won't pleasantly solve the problem of making it happen in the destructor though.

Pseudocode:

void CNode::cleanup()
{
    std::deque<CNode*> nodes;
    nodes.push_back(this);
    while(!nodes.empty())
    {
        // Get and remove front node from deque.
        // From that node, put all non-null children at end of deque.
        // Delete front node.
    }
}
Mark B