views:

193

answers:

5

In my application I need to check a collection of 2D coordinates (x,y) to see if a given coordinate is in the collection, it needs to be as fast as possible and it will only be accessed from one thread. ( It's for collision checking )

Can someone give me a push in the right direction?

+5  A: 

The absolute fastest I can think of would be to maintain a 2D matrix of those points:

//just once
int[][] occurrences = new int[X_MAX][Y_MAX];
for (Point p : points ) {
    occurrences[p.x][p.y]++;
}

//sometime later
if ( occurrences[x][y] != 0 ) {
    //contains Point(x, y)
}

If you don't care how many there are, just a boolean matrix would work. Clearly this would only be fast if the matrix was created just once, and maybe updated as Points are added to the collection.

In short, the basic Collections aren't perfect for this (though a HashSet would come close).

Edit

This could be easily adapted to be a Set<Point> if you don't find a library that does this for you already. Something like this:

public class PointSet implements Set<Point> {
    private final boolean[][] data; 
    public PointSet(int xSize, int ySize) {
        data = new boolean[xSize][ySize];
    }

    @Override
    public boolean add(Point e) {
         boolean hadIt = data[e.x][e.y];
         data[e.x][e.y] = true;
         return hadIt;
    }

    @Override
    public boolean contains(Object o) {
        Point p = (Point) o;
        return data[p.x][p.y];
    }

    //...other methods of Set<Point>...
}
Mark Peters
Agreed: if you don't want to maintain the whole `boolean` matrix, a `HashSet` is probably your best bet.
VeeArr
Added an implementation of Set based on this principle; note that you'd be best to note where/if it breaks the Set contract. For example, this doesn't do bounds checking so if you add a Point out-of-range it'll fail.
Mark Peters
A: 

You can try some sort of sorted set, like treeset, since you can do binary searches on it.

Vinh
binary search is O(log N) as opposed to the O(1) solutions given in other answers.
Kevin Bourrillion
well i guess what you lose in speed you can gain in space usage and flexibility.
Vinh
+2  A: 

I would go using some Trove collections data structures.

If your points are stored as a couple of int or a couple of float you can pack them in a long: 32 bits for x-coord and 32 bits for y-coord. Then you can use a TLongHashSet that is an HashSet optimized for working with primitive data (it will be faster and consume less memory compared to normal java collections).

If you have int coordinates it would be something like

static private long computeKey(int h1, int h2)
{           
    return ((long)h1) << 32 | h2;
}

to compute the key and then use it

TLongHashSet set = new TLongHashSet()
set.add(long v);
set.addAll(long[] v);
set.containsAll(..);

if you have float values you can do the same thing, but you have to pack float bits inside the long.

Jack
Good suggestion, though one thing to note is you'd probably want to change the hashing strategy used by the `TLongHashSet`. The default uses `return ((int)(value ^ (value >>> 32))) * 31;` which is good for randomly distributed data but awful for data like this. Data as simple as (0,1) and (1,0) will result in a hash collision, for example. It's not good for longs where the first 32 bits has a correlation to the last 32 bits.
Mark Peters
In fact, I ran your `computeKey` against the default hash function for data including every Point with X and Y between 0 and 1000, and it only generated 1024 unique hashes! That's a hash collision probability of 99.90%!!
Mark Peters
Yes, probably you are right. I used it for a problem similar to this one but that had different distribution of values so it worked like a charm (I've been able to make code that ran 25% faster and saving up to 300-400 Mb of ram out of 2.0 Gb)
Jack
A: 

How often do you have to update the collection in comparison to searching it? You should chose an appropriate data structure based on that.

Point2D implements comparable, right? Then your best bet is probably a TreeSet, they are incredibly fast and I believe they rely on B+ trees, which you may know are used in actual databases and filesystems.

If you think you're going to be doing a fair amount of updates to the structure, take a look at the SkipList. It guarentees O(log(operations)) **NOTE this is for ALL operations you do, there is no guarentee about the runtime of a single opperation)

+1  A: 

HashSet. Its O(1) average. If you want true O(1) you can make a wrapper for your object which has a reference to a collection. That way you cant just compare it with the collection you have.

takoi