views:

150

answers:

4

I have the following text that I am trying to parse:

"[email protected]" <[email protected]>, "Jane Doe" <jane.doe@ addyB.org>,
"[email protected]" <[email protected]>

I am using the following code to try and split up the string:

Dim groups As GroupCollection
Dim matches As MatchCollection
Dim regexp1 As New Regex("""(.*)"" <(.*)>")
matches = regexp1 .Matches(toNode.InnerText)
For Each match As Match In matches
    groups = match.Groups
    message.CompanyName = groups(1).Value
    message.CompanyEmail = groups(2).Value
Next

But this regular expression is greedy and is grabbing the entire string up to the last quote after "[email protected]". I'm having a hard time putting together an expression that will group this string into the two groups I'm looking for: Name (in the quotes) and E-Mail (in the angle brackets). Does anybody have any advice or suggestions for altering the regexp to get what I need?

+1  A: 

How about """([^""]*)"" <([^>]*)>" for the regex? I.e. make explicit that the matched part won't include a quote/closing paren. You may also want to use a more restrictive character-range instead.

sepp2k
The first part should be `""([^""]*)""`, not `""([^)]*)""`.
Alan Moore
Yes, of course. I confused myself with all that punctuation ;-)
sepp2k
You should be able to edit your answer and fix that.
Alan Moore
A: 

You need to specify that you want the minimal matched expression. You can also replace (.*) pattern by more precise ones: For example you could exclude the comma and the space... Usually it's better to avoid using .* in a regular expression, because it reduces performance !

For example for the email, you can use a pattern like [\w-]+@([\w-]+.)+[\w-]+ or a more complex one.
You can find some good patterns on : http://regexlib.com/

Matthieu
A: 

Not sure what regexp engine ASP.net is running but try the non-greedy variant by adding a ? in the regex.

Example regex

""(.*?)"" <(.*?)>
Wgaffa
+2  A: 

Rather than rolling your own regular expression, I would do this:

string[] addresses = toNode.InnerText.Split(",");
foreach(string textAddress in addresses)
{
    textAddress = address.Trim();
    MailAddress address = new MailAddress(textAddress);
    message.CompanyName = address.DisplayName;
    message.CompanyEmail = address.Address;
}

While your regular expression may work for the few test cases that you have shown. Using the MailAddress class will probably be much more reliable in the long run.

Lee
Didn't even THINK about doing this!!
swolff1978
This method works really well as long as a DisplayName isn't "Doe, Jane"... do you have any suggestions on how to modify this method for those special PITA cases?
swolff1978
I don't have a simple answer to this, unfortunately ... but it appears that this answer (http://stackoverflow.com/questions/1047531/splitting-comma-seperated-values-csv/1047670#1047670) suggests using the TextFieldParser class for comma-delimited stuff.
Lee
I ended up cheating a little bit - but hopefully it will keep things reliable. I added this line before the split: Dim seps() as String = {", <", ">, """} and changed the split line to: Dim addresses() as String = toNode.InnerText.Split(seps, StringSplitOptions.RemoveEmptyEntries)
swolff1978