I'm looking for references on separating a name: "John A. Doe" in parts, first=John, middle=A., last=Doe. In Mexico we have paternal, maternal, first and second given names, and can be written in different permutations, so the problem is quite complex.
As it depends on data, we are working with matching software that calculates a score for every word so we can take decisions (it is based on a big database). The input data is not clean, it is imported from some government web pages and is human filtered so it could have junk that has to be recognized as well. Any suggestions?
[Edit] Examples:
name: Javier Abdul Córdoba Gándara common permutations (or as it may appear in gvt data referring to same person): Córdoba Gándara Javier Abdul Javier A. Córdoba Gándara Javier Abdul Córdoba G. paternal=Córdoba maternal=Gándara first given:Javier second given:Abdul
name: María de la Luz Sánchez Martínez paternal:Sánchez maternal: Martínez first given: María de la Luz
name: Paloma Viridiana Alin Arias Medina paternal: Arias maternal: Medina first given: Paloma second given: Viridiana Alin
As I said what the meaning of each word depends on the score. One has no way of knowing that
Viridianaand
Alinare given names if not from the score.
We have a very strong database (80 million records or so) so we can get some use of the scoring system. I am designing some algorithm that uses that but looking for other references.