I'm offering a search option for my users. They can search on city name. The problem is that my city names I have stored are things like "Saint Louis". But I want to find Saint Louis even if the user types in "St. Louis" or "St Louis". Any suggestions on how I could create a lookup table to take that into account somehow?
You might want to look into a more full-featured Full-Text search engine such as Apache Lucene/Solr or Sphinx - which can support this kind of string mapping natively.
I see a number of possible ways to deal with this. One is a soundex lookup algorithm that matches the similarity of English strings. Furthermore, this is supported natively in some databases like PostgreSQL.
Another, approach may simply be to offer your users an auto-complete functionality where as they type a number of suggestions appear. This way users will pick the desired lookup city name intuitively.
Create two tables.
One contains everything about a city.
One contains a bunch of names for cities, and a foreign key association those naes with the id of the the first table. So you have a one to many relationship between city and city_names.
Now the only problem is distinguishing the one name, for each city, that is the preferred name. We can do that a couple of ways: 1) the first table could have a fk to the second table, that holds to id of the preferred name. This creates a circular dependency, though. So better, 2) just add a boolean/bit column to the second table, is_preffered.
create table city (id not null primary key, other columns ) ;
create table city_name (
id not null primary key,
city_id int references city(id),
name varchar(80),
is_preferred bool
) ;
Then to get all names, with the preferred name first:
select name from city_names where city_id = ?
order by is_preffered desc, name;
This has an additional advantage: if you don't cover every city and town, you can use the second table to map towns/villages/counties you don't cover to the major cities you do:
insert into city_name(city_id, name) values
( $id-for-New-York-City, 'New York'),
( $id-for-New-York-City, 'Manhattan'),
( $id-for-New-York-City, 'Big Apple'),
( $id-for-New-York-City, 'Brooklyn');
What I would do is, build a shorthand-to-normal table, that would map any ambiguous word to a single consistent spelling you'll use in your primary table. You can include there common spelling mistakes and typos.
Before looking up the user's request, convert all the words to normal form using this table.
So in your case in the shorthand-to-normal
table we'll have
______________
| short|normal |
|______|_______|
|St |Saint |
|St. |Saint |
As a general approach, you can normalize items both when inserting and when searching them.
Normalization rules could be:
Saint => St
St. => St
etc.
The normalized names should then match.
IMHO i'd leave the database alone and instead have a downdown list of cities in your application. Easier, cleaner, and doesn't require much extra.
I like the option in the first answer.
Another thought would be to have a column for tags for that city that the users coudl update.
i.e.
New York City is the official name.
Tags for this city would be numerable ( Manhattan, NY, NYC, the city, big apple.. ) e.t.c. but you wouldn't want all that junk in your main Cities table or to create assicated child tables and have to do joins. So just tuck it in a columns and search it based on the Search Term but then return the proper name if it's found.
You can use the built in SQL FTS properties for thesaurus entries. This allows you to buld a custom word map inside full text search . That way you can keep everything inside FTS rather than mix FTS and other queries.
Not sure which version of SQL you are using as its differant between 2005/8 so there is a good walkthrough for 2005 / 8 here http://arcanecode.com/2008/05/28/creating-custom-thesaurus-entries-in-sql-server-2005-and-2008-full-text-search/