views:

570

answers:

2

I've got a bunch of first names in a field that carry a middle initial with a '.' at the end..

I need a regex to convert this example:

Kenneth R.

into

Kenneth

I was trying to build my own and found this useful site btw..

http://www.gskinner.com/RegExr/

but I'm new to Perl & regular expressions and could only get "...$" - which is useless when there is no middle initial at the end of the first name....


i just found another name format that needs consideration... 'R. Kelly' needs to be 'Kelly'

+3  A: 

To remove the last "word" if it ends with dot :

my $name =~ s/\w+\.$//i;

(this supposes you don't have any space after that)

To remove any word ending with dot :

my $name =~ s/\w+\.//i;

look at the /g modifier if you want to remove them all ...

and BTW make yourself a test case list to check your solution then try with real word data, you probably will get some surprises ...

siukurnin
that works... thanks a lot... see my edit about R. Kelly too. I've got a couple names in the db that have that format also
CheeseConQueso
hahah nm thanks, i saw your original answer and refreshed and now your second came up
CheeseConQueso
can you give me a brief desc on how each element works in these? i want to learn more about these and they are very non-intuitive even when you know what they are supposed to be doing
CheeseConQueso
You might want to remove the space before the initial as well: my $name =~ s/\s+\w+\.$//i;
gpojd
If you want a way to explain regexes you get and don't understand, you can try the YAPE::Regex::Explain module form CPAN. http://search.cpan.org/~pinyan/YAPE-Regex-Explain-3.011/Explain.pm
gpojd
I think this would work better: s/\s*\w\.\s*//i; This regexp looks for some text which has zero or more spaces, followed by a word (basically, a-z without spaces), followed by a period (\.; needs to be escaped because '.' is special), followed by zero or more spaces. It replaces that with nothing.
strager
strager: note that if an initial happens to fall in the middle of the name, e.g. "John Q. Smith", your solution will join together the two surrounding words, e.g. "JohnSmith".
j_random_hacker
+2  A: 

To take care of the R. Kelly case:

s/\w\. *//g

Here's a quick test:

$ echo 'R. Kelly
Kenneth R.
R. Kemp R.
John Q. Smith' | perl -pe 's/\w\. *//g'
Kelly
Kenneth 
Kemp 
John Smith

I'd suggest that:

  1. The global option (g) is required.
  2. The case insensitive option (i) isn't.
  3. You might consider looking for upper case ([:upper:]) initials only.
  4. Multiple character "initials" should be viewed with suspicion. (So w+ is probably a mistake unless your data has relevant cases.)
  5. Read perldoc perlre for more information.
Jon Ericson
But notice that if an initial happens to fall in the middle of the name, e.g. "John Q. Smith", your solution will join together the two surrounding words, e.g. "JohnSmith".
j_random_hacker
I added a simple fix that puts any spaces at the end of the string. That could be cleaned up with: s/ *$// if desired.
Jon Ericson