views:

50

answers:

3

The manpage for the POSIX binary tree functions includes the following statements:

tdelete() returns a pointer to the parent of the item deleted, or NULL if the item was not found.

tdelete() frees the memory required for the node in the tree. The user is responsible for freeing the memory for the corresponding data.

This means that there is no way to get access to a node's data for a given key from a tdelete() call. One would be required to call tfind() (rather than tsearch() so as not to add the given key), perform destruction of the node's data, and then call tdelete() with the same key to remove the node from the binary tree.

Have I interpreted this correctly? Is there some way around what I perceive to be limitations with this approach?

  1. If the key is heap-allocated, it can't be freed (or made useless to the comparison function in use) before the node is deleted. This requires calling tfind() to obtain a pointer to the data, tdelete() to remove the node, and then destroying the data retrieved from the tfind() call.
  2. Two lookups are required to delete a node and destroy it's enclosed data.
+1  A: 

I've never used this set of functions before, but you appear to be correct. However, I would try using the pointer returned from tfind() as the rootp argument to tdelete(). This should make the second search essentially a no-op. AFAICT, the data structure allows any node to be treated as a root so you would find the node and then delete it from the subtree rooted at the node that you found.

D.Shawley
Oh, good catch on treating any node as a root.
llasram
I've considered this. Unfortunately nodes do not store a parent pointer: Deleting an arbitrary node would leave a dangling pointer further up the tree.
Matt Joiner
A: 

Like D.Shawley, I have never used these function before, so I wrote a little test program that leaks memory.

#include <search.h>
#include <stdio.h>
#include <stdlib.h>

int fx(const void *a, const void *b) {
  if (*(int*)a < *(int*)b) return -1;
  return (*(int*)a > *(int*)b);
}

int main(void) {
  int i, *elem;
  void *root = NULL, *val;

  for (i = 0; i < 10; i++) {
    elem = malloc(sizeof *elem); /* LEAK, leak, leak */
    *elem = i/2;
    val = tsearch(elem, &root, fx); /* assume all OK */
    printf("i: %d; elem: %p (%d); val: %p (%x)\n",
           i, (void*)elem, *elem, val, *(int*)val);
  }

  for (i = -1; i < 6; i++) {
    val = tfind(&i, &root, fx);
    printf("i: %d; ", i);
    if (val) {
      printf("@%p (%x)\n", val, *(int*)val);
    } else {
      printf("NOT FOUND\n");
    }
  }

  return 0;
}

Running it outputs

i: 0; elem: 0xcb8010 (0); val: 0xcb8030 (cb8010)
i: 1; elem: 0xcb8060 (0); val: 0xcb8030 (cb8010)
i: 2; elem: 0xcb8080 (1); val: 0xcb80a0 (cb8080)
i: 3; elem: 0xcb80d0 (1); val: 0xcb80a0 (cb8080)
i: 4; elem: 0xcb80f0 (2); val: 0xcb8110 (cb80f0)
i: 5; elem: 0xcb8140 (2); val: 0xcb8110 (cb80f0)
i: 6; elem: 0xcb8160 (3); val: 0xcb8180 (cb8160)
i: 7; elem: 0xcb81b0 (3); val: 0xcb8180 (cb8160)
i: 8; elem: 0xcb81d0 (4); val: 0xcb81f0 (cb81d0)
i: 9; elem: 0xcb8220 (4); val: 0xcb81f0 (cb81d0)
i: -1; NOT FOUND
i: 0; @0xcb8030 (cb8010)
i: 1; @0xcb80a0 (cb8080)
i: 2; @0xcb8110 (cb80f0)
i: 3; @0xcb8180 (cb8160)
i: 4; @0xcb81f0 (cb81d0)
i: 5; NOT FOUND

Apparently the values "inside the binary tree" are copies of pointers to the data. The copies are managed by the <search.h> functions and the data should be managed by the program.

The memory 0xcb8030 should be freed by a call to tdelete(), the corresponding 0xcb8010 (elem with value 0) should be freed by the program (as well as the previously lost 0xcb8060).

pmg
Matt Joiner
Change my program by adding the necessary `tdelete()` and `free()` calls so it doesn't leak. `tdelete()` and `free()` calls will all be to different things: there is no overlap.
pmg
search.h is not C89, bsearch/qsort/... are declared in stdlib.h; you dont know the well known "standard"-compare definition <return *(int*)a-*(int*)b;> ?
`tsearch`, `tfind`, `tdelete`, etc are functions not defined by the C Standard, no matter what header they're declared in. My documentation suggested `<search.h>` so I used it. I misread the documentation and coded the compare function to return -1, 0, or 1 rather than <0, 0, or >0; but you're right about the "standard"-compare.
pmg
+1  A: 

Matt, I think that your interpretation is correct. For individual deletion of elements this seems unavoidable. (But anyhow it is only a factor of 2 for an operation of \Omega \log N cost ;)

Long time that I didn't use these function, but looking through old code it looks that you can avoid part of the overhead if you destroy the tree all at once (if this is your case) by using two calls of twalk:

  1. count the number N of elements in your tree with an appropriate call to twalk
  2. allocate an array of size N of pointers to tree items
  3. fill this array by a second twalk through the tree
  4. run through this array and for each tree node first delete its data and then the node itself

If your really need such a thing you can find a thread safe C++ encapsulation of these calls in our library parXXL in the directory par/sys.

Jens Gustedt
I'm glad someone hit on the fact that 2logN is still O(log N). I think destroying the tree can be done quicker: Store the data for the root node, call delete on the root node, then destroy the data. Repeat with the new root node until there is no root node.
Matt Joiner
I see your answer is inspired by `stree< T >::itemp* par::sys::stree< T >::collect`, I wonder now if my suggestion works without the allocation of storage for the nodes.
Matt Joiner
Take a look at http://code.google.com/p/anacrolix/source/browse/public/stackoverflow/tdelete_data_access.c#33, you might find this better than use of `collect` in parXXL. Build and valgrind instrs are at the top in a comment.
Matt Joiner
@Matt. Great, yes works. The problem with your approach is thecomplexity of the `tdelete` operation. Nothing in POSIX states howthis is done, nothing imposes that this is done with just sometree-rotations. So your approach has `N × T(tdelete)` operations.If you do the correct `twalk`, you can be sure to do a leafs-firstdeletion, which might be a bit more tolerant against bogusimplementations. I don't say that we achieve that in parXXL, tough ;-)
Jens Gustedt
@Jens Gustedt: Good to know, this makes sense. Does the RB tree eglibc employs mean that the node retrieved by `tfind` will be the root node when `tdelete` is called?
Matt Joiner
@Matt: no idea, this is not only a question of the algorithm they use, AFAIR they claim to have implemented one from Knuth. But it is also really a question of how the particular implementation is done, and there only the source code can tell. I never looked into that.
Jens Gustedt