views:

1817

answers:

7

Is there a regular expression to find two different words in a sentence? Extra credit for an expression that works in MS Visual Studio 2008 :)

For example:

reg_ex_match(A, B, "A sentence with A and B") = true
reg_ex_match(C, D, "A sentence with A and B") = false

See also this related question

A: 

Try searching regexlib.

Kon
A: 

Why not use boolean logic, rather than a complicated regex?

Code not tested:

public bool reg_ex_match(Regex A, Regex B, string s) {
    return A.isMatch(s) && B.isMatch(s);
}

Update: This assumes A and B are defined with word boundaries:

Regex A = new Regex(@"\bA\b");
toolkit
This doesn't work if A="foo" and B="foomator" then it will return true for "this is foomator".
Łukasz Lew
+2  A: 

".*A.*B.*|.*B.*A.*" You can add spaces around the words A and B if you want proper words.

Łukasz Lew
Careful. This would match "The sentence with AB". Close, though.
alphadogg
Which would be a proper behaviourIf you define a word as a separate word, then as I said you should add spaces around.
Łukasz Lew
Spaces won't cut it, because a word might be at the beginning or the end of the String. In that case it should still be considered a separate word, but hasn't got a space before/after it. See @Gumbos solution using \b for the "real" solution.
Joachim Sauer
Be careful with word boundaries. I've seen lots of people get bitten by not realizing that some "words" they had in their dataset contained characters not in the boundary definition for whatever "flavor" of regex they were using.
alphadogg
This would also match AUTOBAHN or BAILOUT since the .* will also match word characters that surround or are in between the A and B (or B and A). It would even match something like "And always be sure to look both ways BEFORE crossing the street."
Bryan
A: 

This is quite similar to the problem of "and" operator, see this question

jpalecek
A: 

.*A.*\s.*B.*|.*B.*\s.*A.*

Please note the use of the '+' between A and B. This is to ensure you match on separate A and B. If this is not a requirement, then Łukasz Lew's answer is correct.

UPDATE: Changed as per Bryan's excellent observation below. The above expression will recognize A separated from B (or vice versa) with at least one whitespace character (space, tab or line break) between the two regions of interest.

alphadogg
Assuming the sentence is not split by a line break character. (Since that would not match the '.')
alphadogg
You probably meant .\*A.+B.\*|.\*B.+A.\*
Łukasz Lew
Not really. That is proper regex syntax, although it may need to be adapted to whatever environment you use it in...
alphadogg
What I meant is that you need "\" to make * appear in your answer :)
Łukasz Lew
If you're talking about preventing the asterisk from being interpreted as markup, the proper way is either to enclose the text in `backticks`, or to put it on its own line, indented four spaces.
Alan Moore
This has the same issue as Lukasz Lew's regex. It will match unintended targets like AMBIENT or BLANK.
Bryan
True. Damn regexes... :)
alphadogg
+5  A: 

For real words:

\bA\b.+\bB\b|\bB\b.+\bA\b
Gumbo
I guess it depends on what is meant by "words" by the OP. And, the second half of your expression is double B's.
alphadogg
Also, note that a word boundary may not be what you want. To the OP, is "A-B" one word or two every time? Ex: a last name is sometimes hyphenated.
alphadogg
A: 

you guys r wierd

wow i did not know that thanks