tags:

views:

235

answers:

7

I've got a list of email addresses belonging to several domains. I'd like a regex that will match addresses belonging to three specific domains (for this example: foo, bar, & baz)

So these would match:

  1. a@foo
  2. a@bar
  3. b@baz

This would not:

  1. a@fnord

Ideally, these would not match either (though it's not critical for this particular problem):

  1. a@foobar
  2. b@foofoo

Abstracting the problem a bit: I want to match a string that contains at least one of a given list of substrings.

+11  A: 

Use the pipe symbol to indicate "or":

/a@(foo|bar|baz)\b/

If you don't want the capture-group, use the non-capturing grouping symbol:

/a@(?:foo|bar|baz)\b/

(Of course I'm assuming "a" is OK for the front of the email address! You should replace that with a suitable regex.)

Jason Cohen
this misses the current data posted and has no references or explanations.
sfossen
umm, I don't think I missed anything. My regex matches the first list but not the next two lists, and instructs how to do "or" generally in regular expressions, which answers the last part.
Jason Cohen
Agreed; I think it explained it just fine
Craig Walker
+2  A: 

should be more generic, the a shouldn't count, although the @ should.

/@(foo|bar|baz)(?:\W|$)/

Here is a good reference on regex.

edit: change ending to allow end of pattern or word break. now assuming foo/bar/baz are full domain names.

sfossen
actually, the '.' ending is optional (hence why mine is followed by a ?)
Alnitak
that is debatable, until he clarifies if foo === 'google.com' or === 'google'.
sfossen
I've taken 'foo' to be a complete domain name, not just a prefix.
Alnitak
could use the more generic (?:\W|$)
sfossen
you could, but then it would match his three specific domains...
Alnitak
+5  A: 

^(a|b)@(foo|bar|baz)$ if you have this strongly defined a list. The start and end character will only search for those three strings.

Gregory A Beamer
I would think that is way to specific.
sfossen
yes, everything before the '@' should be omitted
Alnitak
+2  A: 

Use:

/@(foo|bar|baz)\.?$/i

Note the differences from other answers:

  • \.? - matching 0 or 1 dots, in case the domains in the e-mail address are "fully qualified"
  • $ - to indicate that the string must end with this sequence,
  • /i - to make the test case insensitive.

Note, this assumes that each e-mail address is on a line on its own.

If the string being matched could be anywhere in the string, then drop the $, and replace it with \s+ (which matches one or more white space characters)

Alnitak
I would think the forcing it to end is overkill.
sfossen
how so? if you don't do that it'll match a@foosnoz
Alnitak
enforcing the '.' is good, but there is no email address like a@foo is has to be like @foo.com.tw
sfossen
yes, of course it does (hint - I do DNS for a living). These are just the OPs example domains.
Alnitak
+1 for atleast thinking it thru and not just dropping something and leaving.
sfossen
You've got a bug there — this will match "bar@foo" and "bar@foo.", but not "[email protected]". I think you meant "/@(foo|bar|baz)(?:\.|$)/i".
Ben Blank
no, that's what I meant - I've taken "foo" to mean any whole domain name, e.g. "example.com", not the exact literal "foo"
Alnitak
+1  A: 

If the previous (and logical) answers about '|' don't suit you, have a look at

http://search.cpan.org/~jhi/Regex-PreSuf-1.17/PreSuf.pm

module description : create regular expressions from word lists

siukurnin
A: 

You don't need a regex to find whether a string contains at least one of a given list of substrings. In Python:

def contain(string_, substrings):
    return any(s in string_ for s in substrings)

The above is slow for a large string_ and many substrings. GNU fgrep can efficiently search for multiple patterns at the same time.

Using regex

import re

def contain(string_, substrings):
    regex = '|'.join("(?:%s)" % re.escape(s) for s in substrings)
    return re.search(regex, string_) is not None

Related

J.F. Sebastian
Nope, this wont work. Because you will get false positives with the 'a@foobar' 'b@foofoo'
Harry
@Harry: Did you read the question and my answer? My code doesn't search for domains it answers the 2nd part of the question: "I want to match a string that contains at least one of a given list of substrings."
J.F. Sebastian
A: 

Ok I know you asked for a regex answer. But have you considered just splitting the string with the '@' char taking the second array value (the domain) and doing a simple match test

if (splitString[1] == "foo" && splitString[1] == "bar" && splitString[1] == "baz")
{
   //Do Something!
}

Seems to me that RegEx is overkill. Of course my assumption is that your case is really as simple as you have listed.

Harry
Craig Walker
Well my answer works for your question, might not suit your particular case, but the beauty of stackoverflow is that it might help someone else
Harry