views:

177

answers:

2

Given the following string, I'd like to parse into a list of first names + a last name:

Peter-Paul, Mary & Joël Van der Winkel

(and the simpler versions)

I'm trying to work out if I can do this with a regex. I've got this far

(?:([^, &]+))[, &]*(?:([^, &]+))

But the problem here is that I'd like the last name to be captured in a different capture.

I suspect I'm beyond what's possible, but just in case...

UPDATE

Extracting captures from the group was new for me, so here's the (C#) code I used:

string familyName = "Peter-Paul, Mary & Joël Van der Winkel";
string firstperson = @"^(?<First>[-\w]+)"; //.Net syntax for named capture
string lastname = @"\s+(?<Last>.*)";
string others = @"(?:(?:\s*[,|&]\s*)(?<Others>[-\w]+))*";

var reg = new Regex(firstperson + others + lastname);
var groups = reg.Match(familyName).Groups;
Console.WriteLine("LastName=" + groups["Last"].Value);
Console.WriteLine("First person=" + groups["First"].Value);
foreach(Capture firstname in groups["Others"].Captures)
    Console.WriteLine("Other person=" + firstname.Value);

I had to tweak the accepted answer slightly to get it to cover cases such as:

Peter-Paul&Joseph Van der Winkel

Peter-Paul & Joseph Van der Winkel

+1  A: 

Assuming a first name can not be two words with a space (otherwise Peter Paul Van der Winkel is not automatically parsable), then the following set of rules applies:

  • (first name), then any number of (, first name) or (& first name)
  • Everything left is the last name.

    ^([-\w]+)(?:(?:\s?[,|&]\s)([-\w]+)\s?)*(.*)
    
MaxVT
+1  A: 

Seems that this might do the trick:

((?:[^, &]+\s*[,&]+\s*)*[^, &]+)\s+([^,&]+)
gnarf