I'm doing a website migration that involves extracting firstname and lastname from fullname. Given these were created by the end user, all kinds of permutations exist (although English and generally not too strange). Mostly I can take the first word as firstname and the last word as the lastname but have some exceptions from the occasional prefix and suffix. In going through the data and trying to get my head around all the likely exceptions I realized that this is a common problem that has been at least partially solved many times before.
Before reinventing the wheel, does anyone have any regular expressions that have worked for them or useful code? Performance is not a consideration as this is a one-time utility.
Typical values to be handled:
Jason Briggs, J.D. Smith, John Y Citizen, J Scott Myers, Bill Jackobson III, Mr. John Mills
Update: while a common problem, the typical solution seems to involve handling the majority of cases and manually cleaning the rest.
(Given the frequency this issue must be experienced I was originally expecting to find a utility library out there but was not able to find one myself with Google)