tags:

views:

189

answers:

2

I am hoping that there might be an easy way to do this, I am assuming regular expressions. Whats the best way in java to split the following string into email addresses?

[email protected], "Jane" <[email protected]>, "Smith, Mr" <[email protected]>

The fact that a comma can appear within the double quotes makes it somewhat more difficult. I guess ideally it would also work with single quotes?

[email protected], 'Jane, Ms' <[email protected]>, "Smith, Mr" <[email protected]>

I thought it would be good to check if there is an easier way save having to write a full parser!

+4  A: 

Most will be handled by:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

Though for full RFC-2822 compliance use:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Both from regular-expressions.info, with discussion on where it falls short of "perfect".

In Java, just keep repeating to find only the email addresses without the names.

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Main {
    public static void main(String[] args) {
        new Main().findEmails("[email protected], \"Jane\" <[email protected]>, \"Smith, Mr\" <[email protected]>");
    }
    public void findEmails(String s) {
        System.out.println("ready: "+s);
        Pattern p = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b",
                                    Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(s);
        while (m.find())
            System.out.println("Found: "+m.group());
    }
}
Chadwick
+2  A: 

From Chadwick's link, a regex correct for RFC2822:

(?:[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:.[a-z0-9!#$%&'+/=?^{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Tom Hawtin - tackline
Isn't it cute? :)
serg
It says to me "regexs are not a good solution to every problem".
Tom Hawtin - tackline