views:

2619

answers:

5

Performance question ...

I have a database of houses that have geolocation data (longitude & latitude).

What I want to do is find the best way to store the locational data in my MySQL (v5.0.24a) using InnoDB database-engine so that I can perform a lot of queries where I'm returning all the home records that are between x1 and x2 latitude and y1 and y2 longitude.

Right now, my database schema is

---------------------
Homes   
---------------------
geolat - Float (10,6)
geolng - Float (10,6)
---------------------

And my query is:

SELECT ... 
WHERE geolat BETWEEN x1 AND x2
AND geolng BETWEEN y1 AND y2
  • Is what I described above the best way to store the latitude and longitude data in MySQL using Float (10,6) and separating out the longitude/latitude? If not, what is? There exist Float, Decimal and even Spatial as a data type.
  • Is this the best way to perform the SQL from a performance standpoint? If not, what is?
  • Does using a different MySQL database-engine make sense?

UPDATE: Still Unanswered

I have 3 different answers below. One person say to use Float. One person says to use INT. One person says to use Spatial.

So I used MySQL "EXPLAIN" statement to measure the SQL execution speed. It appears that absolutely no difference in SQL execution (result set fetching) exist if using INT or FLOAT for the longitude and latitude data type..

It also appears that using the "BETWEEN" statement is SIGNIFICANTLY faster than using the ">" or "<" SQL statements. It's nearly 3x faster to use "BETWEEN" than to use the ">" and "<" statement.

With that being said, I still am unceratin on what the performance impact would be if using Spatial since it's unclear to me if it's supported with my version of MySQL running (v5.0.24) ... as well as how I enable it if supported.

Any help would be greatly appreacited

+2  A: 

I would store it as integers (int, 4-bytes) represented in 1/1,000,000th degrees. That would give you a resolution of few inches.

I don't think there is any intrinsic spatial datatype in MySQL.

ZZ Coder
@ZZ Coder, info on Spatial datatype --> http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
Also, you state using an INT. How can you store data like 93.2343213 as an INT?
@Timtom: 93234321 microdegrees (around 33cm of error)
Javier
@Javier, what about negative long/lat? This assumes that all long/lat are positive values - which is only true for the North Eastern hemisphere --> http://en.wikipedia.org/wiki/ISO_6709
@Timtom: Since this is a performance question, you probably don't want use the GIS/Spatial extension unless you want spatial index. A point represented in WKB takes almost 20 bytes. It's not supported by all engines either.
ZZ Coder
@Timtom: INT is signed so you can store negative values. Simply multiple your degree values by 1,000,000.
ZZ Coder
@ZZ Coder, don't you mean divide by 1,000,000 (or multiple 0.000001)
Integer operations are always faster than floating point. In special, integer comparison
Rodrigo
@Rodrigo, any reference material supporting this claim?
All, please note my updated original post. There is no difference in INT vs FLOAT performance.
A: 

It really depends on how you are using the data. But in a gross over-simplification of the facts, decimal is faster but less accurate in aproximations. More info here:

http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx

Also, The standard for GPS coordinates is specified in ISO 6709:

http://en.wikipedia.org/wiki/ISO_6709

Armitage
@Armitage, let's assume I have a column for latitude and a coloum for longtitude. And that I'm storing the data as, for example, 93.12342342
@Armitage, maybe I should ask - what's the best why I should be storing the long/lat data in MySQL? Assume I want to do lots of queries where I'm returning all the records that are between x1 and x2 latitude and y1 and y2 longitude.
Sounds like the performance difference is negligible. From what I am reading, Spacial will perform better and probably save you time coding your project, but I've never used it.
Armitage
+1  A: 

float(10,6) is just fine.

Any other convoluted storage schemes will require more translation in and out, and floating-point math is plenty fast.

richardtallent
@richardtallent, are you saying that 1) how I'm storing the dataing, 2) the datatype that I've picked ,3) my SQL statement and 4) my InnoDB database engine ... all of these are already picked/designed to be the absolutely most efficient use (best performance)?
@richardtallent, meaning - I can't change anything to make it perform better
+1 Have a read of http://coordinate.codeplex.com/ where Jaime Olivares explains why floats are the correct type for coordinates...
grenade
+1  A: 

The problem with using any other data type than "spatial" here is that your kind of "rectangular selection" can (usually, this depends on how bright your DBMS is - and MySQL certainly isn't generally the brightest) only be optimised in one single dimension.

The system can pick either the longitude index or the latitude index, and use that to reduce the set of rows to inspect. But after it has done that, there is a choice of : (a) fetching all found rows and scanning over those and test for the "other dimension", or (b) doing the similar process on the "other dimension" and then afterwards matching those two result sets to see which rows appear in both. This latter option may not be implemented as such in your particular DBMS engine.

Spatial indexes sort of do the latter "automatically", so I think it's safe to say that a spatial index will give the best performance in any case, but it may also be the case that it doesn't significantly outperform the other solutions, and that it's just not worth the bother. This depends on all sorts of things like the volume of and the distribution in your actual data etc. etc.

It is certainly true that float (tree) indexes are by necessity slower than integer indexes, because of the longer time it usually takes to execute '>' on floats than it does on integers. But I would be surprised if this effect were actually noticeable.

I tested both Float vs INT using the "IN BETWEEN" SQL statement and measuring the time using both the EXPLAIN and BENCHMARK command and the different in result set querying was less than 1% difference in speed. HOWEVER, using the ">" or "<" operator, instead of "in between", resulted in a query execution time nearly 3x longer.
So it appears that no difference exist between using FLOAT vs INT as long as you use the "IN BETWEEN" SQL statement.
I haven't been able to test Spatial since it's unclear to me if my version (v5.0.24) supports Spatial out-of-the-box or if I have to somehow install an extension/plug-in
+1  A: 

I know you're asking about MySQL, but if spatial data is important to your business, you might want to reconsider. PostgreSQL + PostGIS are also free software, and they have a great reputation for managing spatial and geographic data efficiently. Many people use PostgreSQL only because of PostGIS.

I don't know much about the MySQL spatial system though, so perhaps it works well enough for your use-case.

Jeff Davis
I might have to look into that. Any info on the performance of fetching all the records between a box coordinate (lat/long)?