views:

680

answers:

3

Hi all,

I was wondering if anyone could help me with parsing a full name field. I would like to separate it into lastname, firstname, middle initial, suffix.

Here are some inputs for name followed by how I would like for them to be parsed.

                           Parsed Stuff Begins Here-------------------------------------
    name                  | lastname  | firstname        |  middle initial   | suffix |
----------------------------------------------------------------------------------------
PUBLIC, JOHN              | PUBLIC    | JOHN             |  NULL             | NULL
PUBLIC, CHUN CH KIM       | PUBLIC    | CHUN CH KIM      |  NULL             | NULL
PUBLIC, MARY L            | PUBLIC    | MARY             |  L                | NULL
PUBLIC, FRED J JR         | PUBLIC    | FRED             |  J                | JR
PUBLIC, SUE ELLEN J SR    | PUBLIC    | SUE ELLEN        |  J                | SR

I have a list of all the suffix values that one is able to enter, i.e.

JR, SR, I,II,III,IV,V,VI

I've gotten to a point where I split up the lastname and the rest of the name, but I can't quite figure out how to do the rest. I'm using oracle 10g.

This is not a homework question. It's an actual problem I'm working on at work.

Here's what I currently have:

 select id,
        name,
        substr(name,1, instr(name,',')-1) as lname,
        substr(name,(instr(name,',')+1),length(name)) as rest_of_the_name
 from    my_table
 where status='A';

Thanks for any help.

+2  A: 

This is a problem for which there will always be data that breaks it.

What if there are 2 initials? What if the initials are first, as J Edgar Hoover?

You mention values "that one is able to enter." Can you change the way the values are entered to capture them already separated?

UncleO
yeah we are planning to change the way they are entered so that they are all captured in different field, but there are around 5 million names that currently exist which need to be parsed.
zSysop
+1 "there will always be data that breaks it".
Svante
+1 for the best suggestion - to try to capture the values separated in the first place.
Jeffrey Kemp
+1  A: 

You've partially solved it already - you can use your query as a subquery and break the problem down bit by bit, e.g.:

select id, name, lname,
       case
       when substr(x, -2, 1) = ' '
       then substr(x, length(x) - 2)
       else x
       end as first_name, -- e.g. "SUE ELLEN"
       case
       when substr(x, -2, 1) = ' ' 
       then substr(x, -1)
       else null
       end as middle_initial, -- e.g. "J"
       suffix -- e.g. "SR"
from (
select id, name, lname, suffix,
       case when suffix is not null then
       substr(rest_of_the_name, 1, length(rest_of_the_name)-length(suffix)-1)
       else rest_of_the_name end
       as x -- e.g. "SUE ELLEN J"
from (
select id, name, lname, rest_of_the_name,
       case
       when substr(rest_of_the_name,-2)
            in (' I',' V')
       then substr(rest_of_the_name,-1)
       when substr(rest_of_the_name,-3)
            in (' JR',' SR',' II',' IV',' VI')
       then substr(rest_of_the_name,-2)
       when substr(rest_of_the_name,-4)
            in (' III')
       then substr(rest_of_the_name,-3)
       else null
       end as suffix -- e.g. "SR"
from (
select id,
       name, --e.g. "PUBLIC, SUE ELLEN J SR"
       trim(substr(name,1, instr(name,',')-1)) as lname, -- e.g. "PUBLIC"
       trim(substr(name,(instr(name,',')+1),length(name)))
          as rest_of_the_name -- e.g. "SUE ELLEN J SR"
from    my_table
where status='A'
)));
Jeffrey Kemp
btw: read it from the bottom up :)
Jeffrey Kemp
Thanks Jeffery!I've altered the query just a bit and i ended up with a solution that works for most of the names in our db.
zSysop
You may find you'll have to keep changing it quite a bit depending on the consistency of the data - to the point where it will be easier to use a procedural solution.
Jeffrey Kemp
A: 

Here's an unsophisticated answer, based on the retrieving the first and last name, retrieving the MI the same way as the first name, removing the MI from 'rest_of_the_name' as the last name.

SELECT
substr('John Q. Public',1, instr('John Q. Public',' ')-1) as FirstName,
substr('John Q. Public',(instr('John Q. Public',' ')+1),length('John Q. Public')) as rest_of_the_name,
substr(substr('John Q. Public',(instr('John Q. Public',' ')+1),length('John Q. Public')),1, instr(substr('John Q. Public',(instr('John Q. Public',' ')+1),length('John Q. Public')),' ')-1) as MI,
replace(substr('John Q. Public',(instr('John Q. Public',' ')+1),length('John Q. Public')), substr(substr('John Q. Public',(instr('John Q. Public',' ')+1),length('John Q. Public')),1, instr(substr('John Q. Public',(instr('John Q. Public',' ')+1),length('John Q. Public')),' ')-1)) as LastName
FROM DUAL;
MAbraham1