views:

201

answers:

8

I'm offering a search option for my users. They can search on city name. The problem is that my city names I have stored are things like "Saint Louis". But I want to find Saint Louis even if the user types in "St. Louis" or "St Louis". Any suggestions on how I could create a lookup table to take that into account somehow?

+1  A: 

You might want to look into a more full-featured Full-Text search engine such as Apache Lucene/Solr or Sphinx - which can support this kind of string mapping natively.

Cody Caughlan
+1  A: 

I see a number of possible ways to deal with this. One is a soundex lookup algorithm that matches the similarity of English strings. Furthermore, this is supported natively in some databases like PostgreSQL.

Another, approach may simply be to offer your users an auto-complete functionality where as they type a number of suggestions appear. This way users will pick the desired lookup city name intuitively.

Elijah
+5  A: 

Create two tables.

One contains everything about a city.

One contains a bunch of names for cities, and a foreign key association those naes with the id of the the first table. So you have a one to many relationship between city and city_names.

Now the only problem is distinguishing the one name, for each city, that is the preferred name. We can do that a couple of ways: 1) the first table could have a fk to the second table, that holds to id of the preferred name. This creates a circular dependency, though. So better, 2) just add a boolean/bit column to the second table, is_preffered.

create table city (id not null primary key, other columns ) ;

create table city_name (
 id not null primary key, 
 city_id int references city(id), 
 name varchar(80),
 is_preferred bool  
) ;

Then to get all names, with the preferred name first:

   select name from city_names where city_id = ? 
   order by is_preffered desc, name;

This has an additional advantage: if you don't cover every city and town, you can use the second table to map towns/villages/counties you don't cover to the major cities you do:

 insert into city_name(city_id, name) values
 ( $id-for-New-York-City, 'New York'),
 ( $id-for-New-York-City, 'Manhattan'),
 ( $id-for-New-York-City, 'Big Apple'),
 ( $id-for-New-York-City, 'Brooklyn');
tpdi
I'm not seeing why I would need a "preferred" name -- I mean, if they type in New York, I'll match that right away. If they type in Manhattan, and it links back to New York, then why would I care about the preferred city id?
Matt Dawdy
You might not; in that case, leave it out. I was figuring on not putting the name into city at all, in which case we'd want a way to show the "canonical" name in, e.g., reports.
tpdi
I went with this answer and it works like a charm so far. My query does an outer join with the alias table, and the where clause has "WHERE (venue.city = 'st louis' OR alias.city_slang = 'st louis')" and I join the tables on cityname = city_canonical. These columns WILL be changing names soon.
Matt Dawdy
Join on id; it's far more efficient. Add an auto_increment synthetic key id.
tpdi
+3  A: 

What I would do is, build a shorthand-to-normal table, that would map any ambiguous word to a single consistent spelling you'll use in your primary table. You can include there common spelling mistakes and typos.

Before looking up the user's request, convert all the words to normal form using this table.

So in your case in the shorthand-to-normal table we'll have

 ______________
| short|normal |
|______|_______|
|St    |Saint  |
|St.   |Saint  |
Elazar Leibovich
I was wondering about this. So, I would split the input on spaces (and "."). Then I'd do a lookup of their first term "st" and get back "saint" -- then use THAT in my actual query of the city table...hmmm...could work. Thanks!
Matt Dawdy
+1  A: 

As a general approach, you can normalize items both when inserting and when searching them.

Normalization rules could be:

Saint => St
St. => St

etc.

The normalized names should then match.

Lucero
I thought of that, but I'm not sure I could identify all the possibilities. St is just one. What about KC, MO for Kansas City, Missouri? ATL for Atlanta? However, I think I am going to have to do this, but in a table structure as suggested by others.
Matt Dawdy
Yes, you'll have to have some "dictionary" for your normalization anyways. I was typing my answer at the same time as Elazar Leibovich , so that they are both quite similar in what they suggest.
Lucero
A: 

IMHO i'd leave the database alone and instead have a downdown list of cities in your application. Easier, cleaner, and doesn't require much extra.

DForck42
The piece I'm writing is in .Net with SQL server, and it is merely a web service to support an iPhone app. The thing is, we are going to let them type in any city in any country in the world. I don't want to have a drop down with all of that. Would be unusable.
Matt Dawdy
Ok. I was just assuming it was jsut in a regular application.
DForck42
A: 

I like the option in the first answer.

Another thought would be to have a column for tags for that city that the users coudl update.

i.e.

New York City is the official name.

Tags for this city would be numerable ( Manhattan, NY, NYC, the city, big apple.. ) e.t.c. but you wouldn't want all that junk in your main Cities table or to create assicated child tables and have to do joins. So just tuck it in a columns and search it based on the Search Term but then return the proper name if it's found.

5x1llz
A: 

You can use the built in SQL FTS properties for thesaurus entries. This allows you to buld a custom word map inside full text search . That way you can keep everything inside FTS rather than mix FTS and other queries.

Not sure which version of SQL you are using as its differant between 2005/8 so there is a good walkthrough for 2005 / 8 here http://arcanecode.com/2008/05/28/creating-custom-thesaurus-entries-in-sql-server-2005-and-2008-full-text-search/

u07ch
That's interesting. I don't have access to the registry in this case, but thanks for the tip. I could see this being useful in other circumstances.
Matt Dawdy
If you are on sql 2008 synonymns would work too; and they dont need the registry hack. Depending on your language the xml file is here$SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTData .You can setup your cities in the file like this <expansion><sub>New York</sub><sub>Big Apple</sub></expansion>then reload using EXEC sys.sp_fulltext_load_thesaurus_file 1033you should then be able to test select * from mytable where contains(*, 'big apple'). (http://www.simple-talk.com/sql/learn-sql-server/understanding-full-text-indexing-in-sql-server/)
u07ch
pps 1033 = code page relevant for your region (had a space limit)
u07ch