ansaurus

Question

Answer 1

+1 A:

I would not advise combining the two regexes. It's possible, but it will make for code which is harder to understand and maintain down the road.

(Also, leaving the regexes separate will let you handle emails and phone numbers differently down the line, which you're likely to want to do.)

pjmorse 2010-10-10 17:08:18

Answer 2

A:

For one, I would simplify your regex:

(?:\(?\b\d{3}\)?[-.\s]*)?\d{3}[-.\s]*\d{4}\b

will match the same correct numbers as before and have fewer false hits.

Second, your e-mail regex will miss a lot of valid e-mail addresses and have many false positives, too (it would match aaaa@@@@aaaa, for example). While you can never match e-mail address with 100 % reliability using regex, the following one is better, too:

\b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}\b

(Use the case insensitive option when compiling it).

To restrict yourself to some few TLDs, you can use

\b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+(?:asia|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[A-Z]{2})\b

Tim Pietzcker 2010-10-10 17:16:07

Thanks for the modified regex. How do you specify case insensitive option when compiling?

Aaron 2010-10-10 17:23:49

And, you happen to know of a simple way to specify only TLD's for the email address?

Aaron 2010-10-10 17:30:36

`re.compile("regex", re.I)`, and why would you want to limit your regex to TLDs?

Tim Pietzcker 2010-10-10 17:41:40

Cool, I was just thinking to help verify the emails even more.

Aaron 2010-10-10 17:50:53

Not a good idea. You'll have to send an email to a potential address anyway to verify - no regex and no parser can find out if an address actually exists.

Tim Pietzcker 2010-10-10 20:00:36

Ok that makes sense. Is there a way to put both of these together? Or is it best to do each one separately in python?

Aaron 2010-10-10 21:49:19

I just went ahead and did each separately and seems to be working well. So I found a list of TLD's and looked like there were 20 or so. There's no way to manually add these into the regex? I know you said it wasn't a good idea but was just wondering if its possible. Thanks again for all your help with this.

Aaron 2010-10-11 00:16:18

So I tried running the first regex for phone numbers on a html page and it is giving me interesting results. Can you check out the edit that I posted above? I'm not sure where the first few items are even coming from.

Aaron 2010-10-11 14:58:28

Tim Pietzcker 2010-10-11 16:31:56

That makes sense, I completely ignored the fact that it could be coming from the html. Thanks for getting back to me so quickly.

Aaron 2010-10-11 16:57:13

ansaurus

tags:

views:

answers:

Edit regex in python script

related questions