ansaurus

Question

How would you solve this GPS/location problem and scale it? Would you use a Database? R-tree?

Answer 1

+3 A:

Create a MyISAM table with a column of datatype Point
Create a SPATIAL index on this column
Convert the GPS coords into UTM (grid) coords and store them in your table

Issue this query:

SELECT  user_id, GLength(LineString(user_point, @mypoint))
FROM    users
WHERE   MBRWithin(user_point, LineString(Point(X(@mypoint) - 20, Y(@mypoint - 20)), Point(X(@mypoint) + 20, Y(@mypoint + 20))
        AND GLength(LineString(user_point, @mypoint)) <= 20

Note that this query will most probably be run on very volatile data and you will need to do the additional checks on time.

Since MySQL cannot combine SPATIAL indexes, it will be better to use some kind of surface tiling technology:

Split the Earth surface into a number of tiles, say, 1 x 1 " (it's about 30 meters of the meridian and 30 * COS(lon) of the parallel.
Store the data in the CHAR(14) column: 7 digits of the lat + 7 digits on the lon (14 digits at all). Disable key compression on this column.
Create a composite index on (time, tile)
On the client, calculate all possible tiles your mates may be in. For 20 meters distance, this will be at most 9 tiles, unless you are deep at North or South. However, you may change the tiling algorithm to handle these cases.

Issue this query:

SELECT  *
FROM    (
        SELECT  tile1
        UNION ALL
        SELECT  tile2
        UNION ALL
        …
        ) tiles
JOIN    users u
ON      u.tile  = tiles.tile
        AND u.time >= NOW() 
        AND GLength(LineString(user_point, @mypoint)) <= 20

, where tile1 etc are precalculated tiles.

SQL Server implements this algorithm for its spatial indexes (rather than R-Tree that MySQL uses).

Quassnoi 2010-02-15 12:17:29

Wow, this is great

TIMEX 2010-02-15 12:19:34

Quassnoi...did you write that formula yourself? Or is it pretty standard? (the select where... formula)

TIMEX 2010-02-15 12:21:00

@alex: I wrote the formula myself using the standard `MySQL` geomery functions: http://dev.mysql.com/doc/refman/5.0/en/geometry-property-functions.html Note that these functions require a Euclidean plane, hence the requirement to store in `UTM`.

Quassnoi 2010-02-15 12:23:40

Answer 2

A:

Chapter 11 "Database Design Know it All" has some thoughts on how to design such a database.

Anonym 2010-02-15 12:18:48

Thanks, I'll check that out if I ever get a user/pass :)

TIMEX 2010-02-15 12:23:28

Answer 3

+2 A:

Well, the naive approach would be to do an O(n) pass over all points, get their distance from the current point, and find the top 20. This is perfectly Ok for small datasets (say <= 500 points), but on larger sets it's going to be quite slow. In SQL, this would be along the lines of:

SELECT point_id, DIST_FORMULA(x, y) as distance
FROM   points
WHERE  distance < 20

To address the inefficiency of the above method, you would have to use some sort of preprocessing step, most likely space partitioning. That can often dramatically improve performance in nearest neighbour type of searches like this. However, in your case, if all the points are updated every 10 seconds, you would have to do an Ω(n) pass to update the position of each point in the space partitioning tree. If you have more than a few queries between each update, it will be useful, otherwise it'll simply be an overhead.

Max Shawabkeh 2010-02-15 12:22:23

ansaurus

tags:

views:

answers:

How would you solve this GPS/location problem and scale it? Would you use a Database? R-tree?

related questions