views:

238

answers:

3

THE SETUP
I have a table which contains linestrings. Linestrings are made up of multiple geographic points. Each point is made up of a latitude and longitude. Note: the linestring value is stored as TEXT in the database.

So one row in the table might look like this:
id: an integer
linestring: x1, y2, x2, y2, x3, y3, x4, y4

THE PROBLEM
Google Maps only allows up to 1000 elements to be displayed at a time. In my case, I'm displaying 850 linestrings and will need to add many more in the future.

THE QUESTION
Quite a few of the linestrings connect with one or more other linestrings, meaning that they start and/or end at the same coordinates. What I'd look to do is find the best way to optimize the dataset so linestrings which connect at the ends are merged in the DB table. This will reduce the total element count when I parse the DB table and create the display file for google maps.

EXAMPLE
In this example, imagine, the alpha (A,B,C) values represent geographic points. The unoptimized table might look like this:

before optimization:
id linestring
1 A, B, C
2 C, D
3 B, A
4 F, G, H
5 G, I
6 H, J


After optimization:
1 A, B, C, D
2 F, G, H, J
3 G, I


So what is the best way to optimize the data? Is there a particular algorithm which works best? I have some ideas for solutions which I will formulate and add, but they seem verbose and convulated.

I am not a CS major so excuse the sloppy terminology and let me know if clarification is needed anywhere. Thanks!


FYI.. I am using a MySQL DB. I am not using the spatial extensions. If you have an embarrassingly simple solution that uses the spatial extensions I would love to hear about it anyway.

+1  A: 

I think the easiest way to go here is using the MySQL spatial extensions.

Particularly I have only used Oracle spatial extensions. In Oracle we can use functions like SDO_GEOM.RELATE or SDO_RELATE to find out the spatial relationship between two objects (contains, touches, intersects, etc.)

I am sure there is an equivalent spatial function in MySQL

EDIT:

Here is a link that lists all the available MySQL spatial functions.

Igor Zelaya
Just reviewed the MySQL spatial extensions reference. I had skimmed this previously but missed the 'Functions that test spatial relationships between geometries' section. Exactly what I need, worth delving into spatial extensions.
Lokesh Dhakar
A: 

There will be a unique solution if every endpoint appears at most twice (ending one linestring and beginning another), but is that guaranteed? E.g. what happens if you have:

  1. A, B, C
  2. C, D
  3. C, E, F

Should this produce:

  1. A, B, C, D
  2. C, E, F

or:

  1. A, B, C, E, F
  2. C, D

?

Or do you not care?

j_random_hacker
There is no guarantee that an end point will appear at most twice. In your example, either of the solutions your produced would be okay as the number of linestrings is cut down from three to two.
Lokesh Dhakar
I've posted a separate answer as I feel the question in my answer is useful information.
j_random_hacker
+1  A: 

One thing to realise is that, if there is more than one linestring that can be connected to a given linestring, it doesn't matter which is chosen -- the final number of linestrings in the optimised table will be the same.

So in that case, a simple greedy strategy of repeatedly finding a pair of linestrings that can be joined and joining them until you can no longer find such a pair will give you an optimal table. Essentially the pseudocode is:

while (there exists a pair of linestrings x and y that share an endpoint) {
    delete(x)
    delete(y)
    insert(x . y)
}

This can't be done in a single SQL query because of the possibility that the resulting linestring x . y will be used again. You should be able to write the while loop using a procedural language such as T-SQL, or a scripting language (e.g. Perl, using DBI for database access), and using an SQL SELECT query to find a pair or a list of pairs and then processing each using DELETE and INSERT statements.

I would suggest adding two fields to your table, begin and end, and indexing them to speed up the search.

j_random_hacker