Whats the best way to separate the string, "Parisi, Kenneth" into "Kenneth" and "Parisi"?
I am still learning how to parse strings with these regular expressions, but not too familiar with how to set vars equal to the matched string & output of the matched (or mismatched) string.
views:
555answers:
2
+1
A:
Something like this should do the trick for names without unicode characters:
my ($lname,$fname) = ($1,$2) if $var =~ /([a-z]+),\s+([a-z]+)/i;
To break it down:
([a-z]+)
match a series of characters and assign it to the first group $1,
match a comma\s+
match one or more spaces (if spaces are optional, change the + to *)([a-z]+)
match a series of characters and assign it to the second group $2i
case insensitive match
You can change the character class [a-z] to include characters you think are valid for names.
codelogic
2008-12-23 20:31:52
Won't work with names like d'Angeli or Jean-Pierre...
PhiLho
2008-12-23 20:36:55
[a-z] can include all valid name characters.
codelogic
2008-12-23 20:38:33
Yeah, PhiLho is right in this case and I actually do have instances of last names in both of those formats he exampled.
CheeseConQueso
2008-12-23 20:43:10
Oh but thanks for the breakdown.. Thats what I really needed the most help on.
CheeseConQueso
2008-12-23 20:43:41
If you have non-alpha characters in the names, you can add then to the matching pattern: ([a-z'-]+)
Bruce Alderman
2008-12-23 20:59:44
cool thanks.. Ill keep that in mind if i run into a special case name and need to switch or add code
CheeseConQueso
2008-12-23 21:27:49
Both ' and - are valid name characters (eg O'Reilly, Drake-Brockman)
cletus
2008-12-23 21:32:20
Some last names even contain spaces.
bart
2008-12-23 21:51:05
And depending on how you validate your inputs, someone might even try some unicode to represent their own name more correctly.
Adam Bellaire
2008-12-23 21:58:11
Try unicode? If you are working with names and don't support more characters than simply [a-z] your code is badly broken.
innaM
2008-12-24 10:31:13
@Manni, yes and as has been stated at least 3 times, [a-z] is simply an example character class, which can be replaced with whatever the user requires. The OP requested for info on regex grouping, hence my response. In addition, no specific requirements were stated.
codelogic
2008-12-24 17:25:36
+12
A:
my ($lname, $fname) = split(/,\s*/, $fullname, 2);
Note the third argument, which limits the results to two. Not strictly required but a good practice nonetheless imho.
cletus
2008-12-23 20:33:15
... assuming every name from input is in the last, first format. If the comma is missing, won't $fname be undefined?
Bruce Alderman
2008-12-23 21:23:13
@aardvark: Yes, but garbage in, garbage out. OP doesn't mention this as a requirement.
cletus
2008-12-23 21:30:27
@Robert: at the risk of being pedantic, the first arg to split is a regex. :-)
cletus
2008-12-23 21:45:42
@bart: but probably worth adding the limit arg. It's good to use it as a rule of thumb.
cletus
2008-12-23 21:53:53