ansaurus

Question

A data structure based on the R-Tree: creating new child nodes when a node is full, but what if I have a lot of objects at the exact same position?

Answer 1

+1 A:

From common sense I'd not assume having two objects in the same position ever, but if this is a part of the idea, then I would introduce one more axis, say 'spin', an integer number, and impose a restriction that all your objects are fermions (cannot have the same location and spin at the same time).

bobah 2010-05-24 10:23:31

The chance is unlikely if not done programmatically. However, imagine the game spawning multiple players at a spawn point, this would be the exact same location. I'm not quite sure what you mean with spin, what does the integer reflect?

Tom 2010-05-24 10:45:04

Even if two players are spawn at a same point it is probably possible to give them a slightly different coordinates (add random delta).Spin is just another parameter that would make objects different and that would be used only in your data structure to solve the problem of all coordinates being same.

bobah 2010-05-24 19:37:01

Many games, when spawning a second player where another player already is, will kill [(*telefrag*)](http://en.wikipedia.org/wiki/Frag_%28video_gaming%29#Telefrag) that first player, to punish her for [lolligagging](http://www.merriam-webster.com/dictionary/lollygag).

dash-tom-bang 2010-05-24 21:21:04

Implementation of game logic has nothing to do with my problem. You can't expect a game designer who's using my class to code the way I want them to. If they want to add more than X players at the same position, they should be able to do so and not see the server end in an infinite loop.

Tom 2010-05-25 07:27:59

Answer 2

+1 A:

It looks like a bit of a mismatch between your data and your structure, since you have a structure that assumes N objects within an arbitrarily large area when you're supplying it with >N objects on an infinitely small point. It might be worth using a different structure for this data?

Hack fix: apply a tiny random displacement to your newly created objects. This should allow the space to be subdivided by the existing algorithm.

Better fix: ensure that your algorithm for splitting a leaf node always generates at least 2 new leaf nodes to begin with. When reassigning objects to the new leaf nodes, or when performing normal insertions, iterate over all the candidates, and if more than one is equally suitable then you can tie-break based on how full they are. This should result in your co-located players ending up split evenly across the otherwise identical nodes.

Kylotan 2010-05-24 11:17:14

Why is it a mismatch between data and structure? The structure does not assume a set amount of objects, it can be anything, that's why new nodes are created. It simply makes sure that there are not more than X leafs inside 1 node. The random is no option of course. My splitting algorithm always created the maximum amount of nodes already, split over the current node. But I guess I have to change this behaviour?

Tom 2010-05-24 11:23:16

It's a mismatch because you are using a 2D partitioning scheme for objects that don't differ in those 2 dimensions. It's like sorting a collection of books alphabetically by author when they're all written by the same person. (I appreciate however that you presumably have other objects which fit this scheme better.) As for the splitting, the only thing that's wrong is that you have no strategy for dividing the moved objects up among equally-attractive leaf nodes. A tie-breaker based on leaf-node population will solve that part.

Kylotan 2010-05-24 11:35:52

Actually, all the books have different authors. And most have a different location too. There will be some with the same location, but different authors, though. I think my library should be able to store books with the same position. There are never equally-attractive leaf nodes. If a node is split, all childs will reflect a different part of the parent node. The problem is when there are too many objects with the exact same location, as they will be put in their child (there is only 1 child that fits).. and that child will do the same, etc.

Tom 2010-05-24 11:44:30

Your tie-breaker would force me to change my splitting method of dividing the node over multiple child nodes, each reflecting a different part of the parent. This would make the structure a lot more complex.

Tom 2010-05-24 11:44:55

I think you misunderstood my analogy - if you are sorting on 2D and they share the same 2D position then they have the same 'author', ie. they have identical sorting criteria. Thus you have no meaningful way of subdividing them further, which is the problem you face. If all the objects in a node have the same position and dimensions then any two subsets of that node will keep those same properties, meaning nodes based on those subsets would too, and thus there are no distinctly "different parts of the parent" in the degenerate situation that you are trying to fix.

Kylotan 2010-05-24 12:31:37

I understand. I "fixed" it now by looking if all leafs have the same position, before splitting if the node is full, since there is no way to split them and then just ignore the maximum... not sure if I am totally happy about it though. I guess your method of creating identical nodes would be better, but that would require a lot of complexity I think while now I can simply split a parent node over different child nodes.

Tom 2010-05-24 12:48:34

Answer 3

+2 A:

I would like to ask a question...

is it that important than the maxChildsPerNode constraint be respected ?

I would rather think of this maximum as a hint to the structure for when to split, and simply ignore it when there is no meaningful way to perform the split.

Of course you'd better rethink the name then, otherwise it'd be odd for the next maintainer.

In pseudo code I would use something like this:

def AddToNode(node, item):
  node.items.append(item)
  if len(node.items) > node.splitHint:
    leftNode = Node(node.splitHint)
    rightNode = Node(node.splitHint)
    node.split(leftNode, rightNode)
    if len(leftNode.items) == 0 or len(rightNode.items) == 0:
      node.splitHint *= 1.5 # famous golden ratio ;)
    else:
      node.items = [leftNode, rightNode]

The important step is to modify the hint when it's detected than we can't abide by it in order not to perform this check at each insertion (this way we obtain a constant amortized cost).

Matthieu M. 2010-05-24 12:12:20

Good idea. I guess I can not respect the maxChildsPerNode if all objects have the same position. I am having problems to find out whether there is no meaningful way to perform to plit or whether all childs have the same position though.. mainly because of strange float calculations. Eg. floatTopX = 10, floatBotX = 10 and (floatBotX - floatTopY == 0) returns false. This is because I guess it is not 10, but very, very, very, near to 10 (these are the node boundaries). Going to work with this for a bit and report back later.

Tom 2010-05-24 12:28:11

I fixed it by checking if all leafs have the same position before doing a isFull check. Unfortunately this adds some overhead. Not sure if there is a better way...

Tom 2010-05-24 12:47:04

For comparing floating point values the idiomatic way is to use an epsilon. For equality it means that `x == y` is transposed into `x - y < epsilon`. It is then up to you to specify the epsilon depending on your constraints.

Matthieu M. 2010-05-24 12:55:20

Depending on the time it takes to do the `isFull` check, it may cost less to check the position only if the checks returns full. Also note that I modified the `max` constraint on the fly so that `isFull` would not return true too often in order to obtain an amortized complexity.

Matthieu M. 2010-05-24 12:57:15

What do you recommend? The epsilon check (to be honest I don't like constant values -- maybe this epsilon should be different on different machines or something?) or checking if all leafs have the same position if a node is full before creating more nodes? See my updated question for my way of checking if all leafs have the same position (I think this should be pretty fast).

Tom 2010-05-24 13:02:42

The epsilon really depends on the values you have, floating point values can vary widely under some operations (for example sums) and you'd have to determine it yourself... for partitioning however the very fact that they are different provides you with a simple discrimination criteria so perhaps that you don't have to use it there (for searching for items though, you'd better search for items near a point rather than at the exact coordinate).

Matthieu M. 2010-05-24 13:21:26

For the leaves check, you could maintain the list sorted and then you'd would simply have to compare the first and last elements. A naive sorting method (sorting by X and if equal sorting by Y) would work perfectly here. It means insertion would be a bit longer, since it takes O(log N) to look-up the place in which to insert, but since N is bounded here by maxChildsPerNode, it should not be deterrent.

Matthieu M. 2010-05-24 13:24:07

The isFull method will not be called often though, while the insertion one is called all the time. So, adding extra calculations to insertion to prevent overhead of something that doesn't happen often doesn't seem to be reducing overhead.

Tom 2010-05-24 13:48:17

Sorry, I had assumed you would be checking each time you insert, as is done in a `vector` or other dynamic array.

Matthieu M. 2010-05-24 15:57:49

I'm still not convinced though about this "fix". What if some of the objects are within that epsilon range, but not actually equal to each other? Maybe I should use the Xb - Xt < epsilon check instead? If so, how do I determine the size of my epsilon? Should this be a constant or user defined?

Tom 2010-05-24 16:10:52

It is up to you to decide under which conditions two objects can be considered at the same position. It notably depends on the loss of precision that may occur which itself depends on the operations that are going on... if you want exact precision, then you need an exact representation (integers). However, it does not really matter for clustering, it only matters when looking the objects up, so you may only need the epsilon when performing a search in your R-Tree.

Matthieu M. 2010-05-25 06:43:51

Eh.. how can you say I may only need the epsilon when doing a search even though you suggested me to use one to solve my inserting problem? Why would it be up to me to decide under which conditions two numbers are not the same? This class is supposed to be exchangeable, there should be no project-based-opinions inside. I never said I wanted exact precision, please keep the main problem in mind: inserting multiple objects with the same position results in an infinite loop of trying to split up nodes which are not splitable.

Tom 2010-05-25 07:26:25

Answer 4

+1 A:

If you have a set of objects on the exact same spot, any query for a region that contains one should return all - so there's no reason to split them, as it doesn't gain you anything. Simply either count the number of distinct locations when deciding to split, or have each element on the leaf node be an object that encapsulates (coordinates, [list of objects at those coordinates]).

Nick Johnson 2010-06-01 09:31:34

ansaurus

tags:

views:

answers:

A data structure based on the R-Tree: creating new child nodes when a node is full, but what if I have a lot of objects at the exact same position?

related questions