I have an application that reads XML information about a vehicle title and parses it into my application. In my database, I always store my names according to whether it is an individual's name or a company's name(because that can occur in my system). The trouble is that the XML source has name data, but it does not specify if it is an individual or a company. I need to know so I can store it appropriately in my database. Is there a database of names or a regular expression or a library that could check the string to see if it matches an individual's name? Thanks!
You are going to be hard-pressed to find one. Individual names, in particular, are often limited only by imagination. However, if you need one, may I suggest gathering a list of all car manufacturers that your application cares about, and check the XML name data against this list; if a match is found, obviously the name is a company, and if not, you can assume the name is an individual.
No, there is no way to know. Are you dealing with Frank Zappa's child, Moon Unit, or are you dealing with Moon Unit, your number one source for real moon rock memorabilia? Names can be anything, company names can be anything (including the names of their owners!). The only way to know for sure which it is is if the data is supplied to you.
At a large telco that I used to work for we had this problem. We tested the following regular expression on 2 Million plus names
([A-Z][a-z][a-z]*) *([A-Z][a-z]*)\.? *([A-Z][a-z][a-z][a-z]*)
We got a 99.8% accuracy with this. The data was fairly clean. This was for a regular expression engine in C - so the syntax may be a little off from perl. I don't know if you will need the parenthesis.
Well, names obviously have a first and last name broken up by a space, companies on the other hand would have Ltd (Limited), PLC (Public listed company) or LLC (a type of company listed under USA regulations)...am I going off the beaten track here? if the last_name and first_name is empty, check the company field, and vice versa...It seems you have put the combination of the two into the one field which makes it harder to do....