tags:

views:

425

answers:

1

How can I create an array of email addresses contained within a block of text? I've tried

addrs = text.scan(/ .+?@.+? /).map{|e| e[1...-1]}

but (not surprisingly) it doesn't work reliably.

+5  A: 

Howabout this for a (slightly) better regular expression

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

You can find this here:

http://www.regular-expressions.info/email.html

Just an FYI, the problem with your email is that you allow only one type of separator before or after an email address. You would match "@" alone, if separated by spaces.

There are some TLDs longer than 4 characters, such as ".museum".
Greg Hewgill
From the article:"The most frequently quoted example are addresses on the .museum top level domain, which is longer than the 4 letters my regex allows for the top level domain. I accept this trade-off because the number of people using .museum email addresses is extremely low"It reduces false-pos.
Finishing the above explained quote:"To include .museum, you could use ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$. However, then there's another trade-off. This regex will match [email protected]. It's far more likely that John forgot to type in the .com top level domain"
Thanks. I get a syntax error with that. I'm not an experienced programmer, and this regex stuff makes my head hurt.Here's the line with the error:text.scan(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b).map{|e| e[1...-1]}text.scan(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b).map{|e| e[1...-1]}
@cmartin: fair enough, +1. :)
Greg Hewgill
@Peter:text.upcase.scan(/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/)...should work fine. I don't know what you're trying to do with the map part. You'll get a list of all the email addresses, doing that. Note the upcase... the regex only matches uppercase (see article)
Thanks cmartin, that's very helpful. (I also see now that the map part was pointless :-)