views:

89

answers:

2

Hi,

I am looking for an efficient way to represent and retrieve the geographical relationship eg. districts->states->USA. This should accommodate any level of hierarchy eg. district->region->states->big region(East/west/south/north) -> USA.

My requirements are

  1. I mostly operate at the lowest level - so getting all of them fast should be the first priority. Constant time is preferred.
  2. Then, I want to perform aggregates eg.combine districts data at state level (so obtain all the children for a node) easily - this is the second criteria.
  3. Order at a level does not matter -eg. For NC, I don't mind if I first get Raleigh or Fayetville.

As you have almost have guessed - A Tree datastructure lends itself to the problem logically. But I could not find a way to get all the leaves efficiently. I can check if a node is leaf in O(log n) time but I have check each of the nodes for that.

I have looked B, B+ trees but what I didn't understand is they maintain their order using some ordering like ascending or descending.

My gut feeling is there should efficient solutions for this because - windows or any file system does this. Files->Folders->Big Folders->C -> My Computer. Also this kind of computations must be done in data mining lets say for clustering (I remember reading something of this sorts)

Any leads in this direction would be appreciated.

Thanks

+1  A: 

You're talking about retrieving n unique items matching a given criterion (in this case everything at a particular level in the hierarchy under a given node). You can't get n unique items from a data structure in constant time unless you've precomputed all possible criteria. At the very least you'll have to iterate through those n items.

There are many data structures and combinations of data structures that you can use to make different types of uses more efficient. You're right that B and B+ trees work well in this situation, which is why I'm going to suggest you use a relational database for this application, because they are the best and most robust B-tree implementations you'll be able to find. Matching leaf nodes and computing aggregates are pretty much what they're for. Unless you have some particular reason not use a RDBMS subsystem, this is probably your best bet.

Welbog
I am retrieving these values from a RDBMS, what I am looking for is a structure in code to perform compuatations efficiently.
satyajit
The structure to perform those computations efficiently is called "SQL". That's what it's for. Is there something specific you can't manage using SQL that you need?
Welbog
@Welbog - SQL is very useful and general purpose but it isn't even remotely efficient compared to a decent in-memory data structure. Consider rendering a dynamically determined subset of this dataset at 50 FPS for example.
mikera
A: 

Create a tree of nodes where each node contains:

  • A pointer to the parent node (or null for the root node)
  • A collection (e.g. HashMap or ArrayList in Java) of child nodes
  • Any data payload associated with the node (e.g. geographic coordinates so you can do distance searching)

If you like you can augment this with a a HashMap based index of String -> Node for O(1) access to nodes. But for this problem I wouldn't worry about the tree search cost, since you are not likely to have more than 5-10 levels max.

mikera