views:

277

answers:

4

I'm involved with a SQL / .NET project that will be searching through a list of names. I'm looking for a way to return some results on similar first names of people. If searching for "Tom" the results would include Thom, Thomas, etc. It is not important whether this be a file or a web service. Example Design:

Table "Names" has Name and NameID
Table "Nicknames" has Nickname, NicknameID and NameID

Example output:

You searched for "John Smith"
You show results Jon Smith, Jonathan Smith, Johnny Smith, ...

Are there any databases out there (public or paid) suited to this type of task to populate a relationship between nicknames and names?

A: 

To select similar sounding name use: (see MSDN)

SELECT SOUNDEX ('Tom')
Dustin Laine
Soundex isn't really meant for first names. And beyond that, (SOUNDEX("Robert") = 'R163') != (SOUNDEX("Bob") = 'B100'), etc.
Doug McClean
Doug's point is critical here. The soundex works for Thom to Tom but not Robert to Bob.
Tom Willwerth
Or Margaret to Peggy. A lookup is necessary.
bmb
Well Robert to Bob is a good catch, but Margaret to Peggy. Come on, look at his question he asked for "similar" how is that similar. And a down vote for it, I don't think that is justified as my answer would work for his question.
Dustin Laine
@durilai I tried to clarify as soon as possible for you and your answer made me think so I'm not the one down voting you. By similar I did not intend "similar sounding" I mean "the same" or "related"
Tom Willwerth
How about Edward/Ted/Theo? How about Henry/Hank? Richard/Dick? The point is that there are a lot of common nicknames that don't work by "sound", and the OP knows that so he asked for a database. If I knew of one, I would suggest one because we looked for the same thing last year.
bmb
@Tom Willwerth, no worries. Just thought it fit your need, before update.
Dustin Laine
+1  A: 

There is a database out there called pdNicknames (found at http://www.peacockdata2.com/products/pdnickname/). It contains everything you need, at a cost of $500.

Christopher Richa
How would you go about getting all the possible patterns? Take the Robert to Bob example, I can't use "like %ob% " because that will match too many.
Tom Willwerth
In that case you would need a separate table, holding an ID for each nicknames to link the real names and nicknames together.
Christopher Richa
@Christopher yes, that is my question, is there a public source of data that I could use to populate the relation between name and nickname.
Tom Willwerth
Well I have found this database: http://www.peacockdata2.com/products/pdnickname/It is not free ($500) and it has an Excel sheet in the sample download that shows you a sample of the database contents.
Christopher Richa
@Christopher This link looks promising, you should make this a new answer
Tom Willwerth
+2  A: 

A google search on "Database of Nicknames" turned up pdNickName (for pay).

In addition, I think you only need a single table for this job, not two, with NameID, Name, and MasterNameID. All the nicknames go into the Name column. One name is considered the "canonical" one. All the nickname records use the MasterNameID column to point back to that record, with the canonical name pointing to itself.

Your two table schema contains no additional information and, depending on how you fill in the nickname table, you might need extra code to handle the canonical cases.

Larry Lustig
Also thanks to Christopher Richa for finding this product in the comment thread below.
Tom Willwerth
+1  A: 

I just found this site.

It looks like you could script it pretty easily.

http://www.behindthename.com/php/extra.php?terms=steve&extra=r&gender=m

I just wish I could auto narrow this to english..

rh0dium