views:

35

answers:

2

I am trying to search a SQL Server 2008 table (containing about 7 million records) for cites and countries based on a user input type text. The search string that I get from the user can be anything like:

"Hotels in San Francisco, US" or "New York, NY" or "Paris sddgdfgxx" or "Toronto Canada" terms are not allways separated by comma and not in a specific order and there might be unusefull data.

This is what I tried:

Method 1: FTS with contains: ex: select * from cityNames where contains(cityname,'word1 and word2') -- with AND select * from cityNames where contains(cityname,'word1 or word2') -- with OR

This didn't work very well because a term like 'sddgdfgxx' would return nothing if used with 'AND'. Using OR will work for one word cities like 'Paris' but not for 'San Diego' or 'San Francisco'

Method 2: this is actually a reverse search, the logic of it is to search if the user imput string contains any of the cities or countries from my table. This way I'll know for sure that 'Aix en Provence' or 'New York' was searched for.

ex: select * from cityCountryNames where 'Ontario, Canada, Toronto' like cityCountryNames

notes: I wasn't able to get results for two words cities and the query was slow.

Any help is appreciated.

+2  A: 

I would strongly recommend using a 3rd-party API like the Google Geocoding API to take such input and parse it into a location with discrete parts (street address, city, state, country, etc.) Then you could use those discrete parts to search your database if necessary.

Map services like Google and Bing have solved this problem way better than you or I ever would, so why not leverage all the work they've done?

John Bledsoe
@John - using some of the Geocoding APIs have query limits and some even forbid access for commercial use. So be sure to read the fine-print
Mikos
@Mikos - that's certainly always a good idea. Google allows use of their geocoding API if you are going to show geocoded points on a map. Looks like Bing may have a stricter terms of use.
John Bledsoe
A: 

SQL isn't designed for the kinds of queries you are performing, certainly not scale. My recommendation would be as follows:

  1. Index all your places (cities + countries) into a Solr Index. Solr is a FOSS search server built using Lucene and can easily query the 7MM records index in milliseconds or less.

  2. Query solr with the user typed string and voila the first match is the best match. So even if the user typed "Paris sddgdfgxx", Paris should be your first hit. If you want to get really sophisticated use an n-gram approach (known as Lucene Shingles)

Since Solr offers a RESTful (HTTP) API should easily integrate into whatever platform you are on.

Mikos