views:

25

answers:

1

I am trying to parse text for email id's using php / regex. Are there any classes or built in methods to do this? The text contains multiple email id's at random places.

The source of the text is .doc files, which I then copy paste into forms, to be processed on submit.

preg_match('/^[^@]+@[a-zA-Z0-9._-]+\.[a-zA-Z]+$/', $email) //from php.net

I submitted a similar question on superuser for software solutions to the problem.

+2  A: 

It's hard to accurately detect emails embedded in running text. You will either match stuff that isn't an e-mail address erroneously, or miss some valid but strange e-mail addresses.

A good starting point is

preg_match_all('/\b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}\b/i', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

(generated by RegexBuddy from its library)

It will match most "normal" addresses ok, but won't find ones like [email protected] or "Tim\ O'Reilly"@microsoft.com. And of course it will match nonsense like [email protected].

Tim Pietzcker
thanks. i'll look into it.
abel
that worked very well. thank you!
abel