views:

265

answers:

3

I have an application where I need to parse a string to find all the e-mail addresses in a that string. I am not a regular espression guru by any means and not sure what the differnce is between some expressions. I have found 2 expressions that, apprently, will match all of the e-mail addresses in a string. I cannot get either to work in my C# application. Here are the expressions:

/\b([A-Z0-9._%-]+)@([A-Z0-9.-]+\.[A-Z]{2,4})\b/i

^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$

Here is an example string:

Adam
<mailto:[email protected]?subject=Your%20prospect%20Search%20-%20ID:
%2011111>  Edwards - Prospect ID: 11111, Ph: (555) 555-5555 
Al
<mailto:[email protected]?subject=Your%20prospect%20Search%20-%20
ID:%20222222>  Grayson - Prospect ID: 222222, Ph: 
Angie

Here is the code in c#:

var mailReg = new Regex(EmailMatch, RegexOptions.IgnoreCase | RegexOptions.Multiline);
var matches = mailReg.Matches(theString);
+2  A: 

The first of your two examples should work if you remove the \b from both ends. The \b means that it expects a word boundary (a space, end of line, &c.) before and after the email address and this is not present in your case.

(Please do not use your new found powers for evil.)

Paul Ruane
`:` and `?` are considered word boundaries, aren't they?
Amarghosh
Your suggestion worked great!
DDiVita
I thought they weren't but you could be correct @Amarghosh.
Paul Ruane
Paul Ruane
+1 "With great power comes great responsibility"
Rubens Farias
Ah....Di not think about that!
DDiVita
@Paul: have you tried it? Why wouldn't it work?
Tim Pietzcker
No, I have not tried it — the pattern does not have a repeating section for multiple subdomin parts. It would have to be changed to something like /([A-Z0-9._%-]+)@(([A-Z0-9.-]+\.)+[A-Z]{2,4})/
Paul Ruane
The bottom section of the following link appears to have a set of useful, and more comprehensive, expressions: http://www.regular-expressions.info/email.html
Paul Ruane
...and then there is, of course, *the* regex pattern for email addresses: http://code.iamcal.com/php/rfc822/full_regexp.txt
Fredrik Mörk
@Paul: Try it, it works. Since the dot is part of the character class, multiple subdomains are matched just fine. Disadvantage: it matches also `[email protected]` - but your new one does so, too. And the OP didn't ask for e-mail *validation*, which is another beast entirely.
Tim Pietzcker
Ah yes, did not notice the dot in there. Regular Expressions and Perl share the unreadability once written trait.
Paul Ruane
`\b` does *not* match whitespace or any other character. It's a zero-width assertion, like the start and end anchors, lookaheads and lookbehinds. Given that every email addresses has to start and end with a word character, the OP definitely *should* use word boundaries.
Alan Moore
Alan (you didn't use to work with me in 2000 did you?), I did not say that \b *matched* a word character, but that its presence will prevent a match if a word is not preceded/followed by a word boundary such as the start of line, end of line or non-word character. I suspected that the colon preceding the address was preventing the match.
Paul Ruane
Alan, I have run some tests and it seems that the \b does not prevent the match. Maybe the OP was not including the at symbol (@) before the pattern string so .NET was interpreting the \b as an escape sequence.
Paul Ruane
A: 

This expression worked: ([a-zA-Z0-9_-.]+)@([a-zA-Z0-9_-.]+).([a-zA-Z]{2,5})

Thanks for looking!

DDiVita
+2  A: 

The first regex is a Perl object (delimited by slashes). Drop the slashes and the mode modifier (i), and it should work:

EmailMatch = @"\b([A-Z0-9._%-]+)@([A-Z0-9.-]+\.[A-Z]{2,6})\b"

Also, .museum is a valid domain, so {2,6} is a bit better.

The second regex only matches entire strings that consist of nothing but an email address.

I would leave the \b intact.

Tim Pietzcker
There are `yahoo.co.uk` email addresses
Amarghosh
What would be a good expression to catch those addresses?
DDiVita
What's wrong with yahoo.co.uk addresses? They'll be matched, too, since the dot is part of the character class after @. Or what am I not getting?
Tim Pietzcker
@Tim The dot within character class escaped my eyes. I noticed only the escaped dot `\.` - But then again as you mentioned in the other thread, it opens another Pandora's box - luckily the question is not about email validation.
Amarghosh