tags:

views:

516

answers:

1

I have two data sources already in Sphinx:

source cities {
    ...
    sql_query = SELECT id, city_name, state_name, state_abbr, latitude,
                longitude, population FROM cities;
    sql_attr_uint  = population
    sql_attr_float = latitude
    sql_attr_float = longitude
    ...
}

source listings {
    ...
    sql_query = SELECT entry_id, title, url_title, category_names, 
                address1, address2, city, state, zip, latitude, longitude,
                listing_summary, listing_url, extended_info FROM listings;
    sql_attr_float = latitude
    sql_attr_float = longitude
    ...
}

Using the PHP Sphinx API I have done searches for matching cities by name and searches for listings within 25 miles of a lat/long without any problem, but now I need to sort of 'join' them... I'd like to be able to:

a) when searching for cities by name, return only cities having listings within 25 miles of them and b) when I'm viewing results for one city (lat/long is known), pull the 3 nearest cities that have listings within 25 miles of them

Is there a way to build a single sphinx search to accomplish these two lookups?

Edit based on comment chain below:

I've updated my cities table to include a field point of type Point and created a spatial index on it:

> describe cities_copy;
+-------------+-----------------------+------+-----+---------+----------------+
| Field       | Type                  | Null | Key | Default | Extra          |
+-------------+-----------------------+------+-----+---------+----------------+
| id          | mediumint(7) unsigned | NO   | PRI | NULL    | auto_increment |
| city_name   | varchar(64)           | NO   | MUL | NULL    |                |
| state_name  | varchar(64)           | NO   |     | NULL    |                |
| state_abbr  | varchar(8)            | NO   |     | NULL    |                |
| county_name | varchar(64)           | NO   |     | NULL    |                |
| county_id   | smallint(3) unsigned  | NO   |     | NULL    |                |
| latitude    | float(13,10)          | NO   | MUL | NULL    |                |
| longitude   | float(13,10)          | NO   |     | NULL    |                |
| population  | int(8) unsigned       | NO   | MUL | NULL    |                |
| point       | point                 | NO   | MUL | NULL    |                |
+-------------+-----------------------+------+-----+---------+----------------+

> show indexes from cities_copy;
+-------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table       | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| cities_copy | 0          | PRIMARY    | 1            | id          | A         | 23990       | NULL     | NULL   |      | BTREE      |         |
| cities_copy | 0          | city/state | 1            | city_name   | A         | NULL        | NULL     | NULL   |      | BTREE      |         |
| cities_copy | 0          | city/state | 2            | state_abbr  | A         | 23990       | NULL     | NULL   |      | BTREE      |         |
| cities_copy | 1          | lat/long   | 1            | latitude    | A         | NULL        | NULL     | NULL   |      | BTREE      |         |
| cities_copy | 1          | lat/long   | 2            | longitude   | A         | NULL        | NULL     | NULL   |      | BTREE      |         |
| cities_copy | 1          | population | 1            | population  | A         | NULL        | NULL     | NULL   |      | BTREE      |         |
| cities_copy | 1          | point      | 1            | point       | A         | NULL        | 32       | NULL   |      | SPATIAL    |         |
+-------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

But when I attempt to update the data to create the points out of lat/long data I get an error:

> update cities_copy set point = Point(latitude, longitude);
Cannot get geometry object from data you send to the GEOMETRY field

Is my syntax off here or am I running into some other problem?

+2  A: 

You need to do the following:

  • Create an additional GEOMETRY field which would hold Point(Latitude, Longitude), replacing latitude and longitude with metric coordinates for the flat earth.

  • Create a SPATIAL index on this field

  • Fix the first query:

    SELECT  *
    FROM    cities cc
    WHERE   EXISTS
            (
            SELECT  NULL
            FROM    listings cp
            WHERE   MBRContains(LineString(Point(cc.latitude - 25, cc.longitude - 25), Point(cc.latitude + 25, cc.longitude + 25)), cp.Coords)
                    AND GLength(LineString(cc.Coords, cp.Coords)) <= 25
            )
    

To find out the three closest cities, issue this query:

SELECT  cp.*
FROM    cities cc
CROSS JOIN
        cities cp
WHERE   cc.id = @id
ORDER BY
        GLength(LinePoint(cc.Coords, cp.Coords))
LIMIT 3

, however note that it will not be very efficient if you have lots of cities.

To make it efficient, you'll need to create a tesselation table (which will tile the Earth surface near you locations), calculate the proximity order of the tiles and join with them.

Here's a simple script to demonstrate:

CREATE TABLE t_spatial (id INT NOT NULL PRIMARY KEY, coords Point) ENGINE=MyISAM;

INSERT
INTO    t_spatial
VALUES
(1, Point(0, 0)),
(2, Point(0, 1)),
(3, Point(1, 0)),
(4, Point(1, 1));

SELECT  s1.id, s2.id, GLength(LineString(s1.coords, s2.coords))
FROM    t_spatial s1
CROSS JOIN
        t_spatial s2
Quassnoi
this is just using mysql then, correct? I was hoping to keep the searching within sphinx because it is a lot faster (and we have 25000 cities).
Ty W
You can build your index in `Sphinx`, using `MySQL` as a data source. Thus you can keep searching in `Sphinx` and use `MySQL`'s spatial capabilities.
Quassnoi
`BTW`, what is the `type` of your sources?
Quassnoi
the type for both sources is mysql. whenever I've done spacial searches in the past I've done it through sphinx using something like $cl->SetGeoAnchor("latitude", "longitude", $latitude*pi()/180, $longitude*pi()/180); $cl->SetFilterFloatRange('@geodist', 0.0, $distance*1609.344);
Ty W
couldn't get the GLength to work with the Points until I used LineStringFromWKB, but thank you for the quick rundown on MySQL's spatial capabilities. Did I miss something easy and can this be shortened to return the distance between the two points? GLength(LineStringFromWKB(LineString(AsBinary(c.point), AsBinary(f.point)))) AS distance
Ty W
See the script in the post update. You don't need `WKB` and `AsBinary`, `GLength` and `LineString` work on internal representation as well.
Quassnoi
Seems like it ought to work but for some reason it just isn't cooperating for me... I edited the original question to show the syntax and results of what I attempted to do with my data.
Ty W