views:

101

answers:

1

Hi,

I have used KD-tree(libkdtree++) to store a multi-dimensional data set, and the requirements here is this data set can support top-k/range queries on different dimensions. For example, a KDTree<3, Point> tree: to find the top 100 points whose have highest Point[1](y axis) values.

From the implementation of libkdtree++, what's similar is the "find_within_range" functions, however it is counted based on "Manhattan distance", which equals max(x_dist, max(y_dist, z_dist)) here. How can I just use range query on one dimension?

A: 

Looking at the code, it looks like you can't do that in a straightforward way, ridiculously enough. If I were you I'd be tempted to either hack the library or write my own kd-tree. I'd ask on their mailing list to be sure, but it looks like you might have to do something like this:

kdtreetype::_Region_ r(point_with_min_y);
r.set_low_bound(min_x, 0);
r.set_high_bound(max_x, 0);
r.set_low_bound(min_z, 2);
r.set_high_bound(max_z, 2);
r.set_high_bound((min_y + max_y) / 2, 1);

double search_min = min_y, search_max = max_y;

// binary search to get 100 points
int c;
while (c = tree.count_within_range(r) != 100) {
    if (c > 100) search_max = (search_min + search_max) / 2;
    else         search_min = (search_min + search_max) / 2;
    r.set_high_bound((search_min + search_max) / 2);
}

tree.visit_within_range(r, process_min_y_point);

This is a horribly inefficient binary search for the Y at which count(points with y <= Y) == 100. I'm not familiar with the library, but that's the best I've got on a cursory inspection.

Walter Mundt
Thanks. The _Region_ class can solve my problem!The top-k query is not necessary that accurate, which means when tree.count_within_range(r) <= 500 then this binary search may stop.
eddyxu