tags:

views:

152

answers:

4

I have an application that reads XML information about a vehicle title and parses it into my application. In my database, I always store my names according to whether it is an individual's name or a company's name(because that can occur in my system). The trouble is that the XML source has name data, but it does not specify if it is an individual or a company. I need to know so I can store it appropriately in my database. Is there a database of names or a regular expression or a library that could check the string to see if it matches an individual's name? Thanks!

A: 

You are going to be hard-pressed to find one. Individual names, in particular, are often limited only by imagination. However, if you need one, may I suggest gathering a list of all car manufacturers that your application cares about, and check the XML name data against this list; if a match is found, obviously the name is a company, and if not, you can assume the name is an individual.

Matthew Jones
Thanks. Since there were mulptiple people with the same answer, I just chose yours since you answered first. So, I think you raise a good point about the difficulty in this. I think I will probably just use use a single field since I don't really need to distinguish.
Austin
+3  A: 

No, there is no way to know. Are you dealing with Frank Zappa's child, Moon Unit, or are you dealing with Moon Unit, your number one source for real moon rock memorabilia? Names can be anything, company names can be anything (including the names of their owners!). The only way to know for sure which it is is if the data is supplied to you.

Brian Schroth
anytime you can put Frank Zappa into an answer you get an upvote from me.
Adam
A: 

At a large telco that I used to work for we had this problem. We tested the following regular expression on 2 Million plus names

([A-Z][a-z][a-z]*)  *([A-Z][a-z]*)\.?  *([A-Z][a-z][a-z][a-z]*)

We got a 99.8% accuracy with this. The data was fairly clean. This was for a regular expression engine in C - so the syntax may be a little off from perl. I don't know if you will need the parenthesis.

Philip Schlump
I see no support for apostrophes or hyphens. How did you deal with names, especially last names, like O'Halloran or Jones-Drew?
Matthew Jones
If an apostrophe is found, insert an extra one, even better, use a stored procedure to insert it and it will go in regardless of how it is done, just keep in mind you'd want to sanitize the inputs first before passing it into the sproc...
tommieb75
A: 

Well, names obviously have a first and last name broken up by a space, companies on the other hand would have Ltd (Limited), PLC (Public listed company) or LLC (a type of company listed under USA regulations)...am I going off the beaten track here? if the last_name and first_name is empty, check the company field, and vice versa...It seems you have put the combination of the two into the one field which makes it harder to do....

tommieb75