tags:

views:

33

answers:

3

Can some one describe an algorithm that finds all keys smaller than x in an array implementation of a min-heap. I want the running time to be at least O(k) where k is the number of keys reported.

I've been scratching my head for a while now with this.

+2  A: 

To get a running time of 'at least' something is not so difficult, I assume you mean 'at most'.

Unfortunately, a min-heap is not very good at finding anything else than the smallest value.

You can do a breadth-first depth-first scan of the tree-view of your heap, and terminate every branch where you have reached X. This will be O(k), but complicated.

To find all Y where Y <= X you might as well scan the entire array, this will be O(n) but with much less overhead.

The choice depends on the ratio n/k

Henk Holterman
+1  A: 

There is a simple recursive algorithm for a tree min-heap:

void smaller_than(Node *node, int x)
{
    if (node->value >= x)
    {
        /* Skip this node and its descendants, as they are all >= x . */
        return;
    }

    printf("%d\n", node->value);

    if (node->left != NULL)
        smaller_than(node->left, x);
    if (node->right != NULL)
        smaller_than(node->right, x);
}

If a subtree's root has a value that is greater than or equal to x, then, by definition of a min-heap, all of its descendants will also have values greater than or equal to x. The algorithm need not explore deeper than the items its traversing, hence it is O(k).

Of course, it's a trivial matter translating this to an array algorithm:

#define ROOT       0
#define LEFT(pos)  ((pos)*2 + 1)
#define RIGHT(pos) ((pos)*2 + 2)

void smaller_than(int x, int pos, int heap[], int count)
{
    /* Make sure item exists */
    if (pos >= count)
        return;

    if (heap[pos] >= x)
    {
        /* Skip this node and its descendants, as they are all >= x . */
        return;
    }

    printf("%d\n", heap[pos]);

    smaller_than(x, LEFT(pos), heap, count);
    smaller_than(x, RIGHT(pos), heap, count);
}
Joey Adams
is the pos variable necessary? and is count the length of the array?
fmunshi
`pos` is used to refer to the current "node" and is also needed in the calculations of the left and right "nodes". And yes, `count` is the length of the array. Of course, the array must be in heap form for this algorithm to work.
Joey Adams
+1  A: 

The implementation as array is irrelevant, you can still do a top-down tree-search. Just instead of using the "classic" pointers, you need to calculate the respective child-node indexes.

With that out of the way do a recursive search from the top and stop recursing on each branch where the current node is greater than x. That will in general prune away a lot of values you don't need to check.

With k return values O(k) is obviously your best case. If your top node is <= x, you start getting results right away. If its greater, you are done - the result is empty.

From there you get results all the way down the subtree until you hit branches with values > x. You need to do at most 2*k Checks to prune those branches, so it looks like O(k) in total to me.

TToni