views:

45

answers:

4

I can use "Alternation" in a regular expression to match any occurance of "cat" or "dog" thusly:

(cat|dog)

Is it possible to NEGATE this alternation, and match anything that is NOT "cat" or "dog"?

If so, how?

For Example:

Let's say I'm trying to match END OF SENTENCE in English, in an approximate way.

To Wit:

(\.)(\s+[A-Z][^.]|\s*?$)

With the following paragraph:

The quick brown fox jumps over the lazy dog. Once upon a time Dr. Sanches, Mr. Parsons and Gov. Mason went to the store. Hello World.

I incorrectly find "end of sentence" at Dr., Mr., and Gov.

(I'm testing using http://regexpal.com/ in case you want to see what I'm seeing with the above example)

Since this is incorrect, I would like to say something like:

!(Dr\.|Mr\.|Gov\.)(\.)(\s+[A-Z][^.]|\s*?$)

Of course, this isn't working, which is why I seek help.

I also tried !/(Dr.|Mr.|Gov.)/, and !~ which were no help whatsoever.

How can I avoid matches for "Dr.", "Mr." and "Gov.", etc?

Thanks in advance.

A: 

In language like Perl/awk, there's the !~ operator

$string !~ /(cat|dog)/

In Actionscript, you can just use NOT operator ! to negate a match. See here for reference. Also here for regex flavors comparison

ghostdog74
Will this work in Actionscript?
Joshua
That syntax almost definitely won't work in AS, but the concept does. It's run a regex, which returns true if it matches. Then just not that relationship (which is done with a ! in AS as well as many other languages).
Sid_M
Unfortunately I cannot rely on the NOT operator in the language using the regular expressions... I need to know how to accomplish the NOT operator WITHIN the regular expression itself. I hope my edit of the question makes things more clear.
Joshua
A: 

You can do this:

!/(cat|dog)/

EDIT: You should've included the programming language on your question. Its Actionscript right? I'm not an actionscript coder but AFAIK its done like this:

var pattern2:RegExp = !/(cat|dog)/;
Ruel
What is the purpose of the slashes?
Joshua
Forward slashes are a standard way declare a regex in some languages (javascript, perl, probably others).
Sid_M
@Joshua, I've edited my answer, please see.
Ruel
I appologize for the lack of clarity in the original post. I've made some changes to make my needs more clear. Thank you for your patience.
Joshua
+1  A: 

It is not possible. You would normally do this using negative lookbehind (?<!…), but JavaScript's regex flavor does not support this. Instead, you will have to filter the matches after the fact to discard those you don't want.

Jeremy W. Sherman
I'm not using Javascript, but ActionScript. Are you aware if Actionscript shares this limitation?
Joshua
Jeremy W. Sherman
A: 

(?!NotThisStuff) is what you want, otherwise known as a negative lookahead group.

Unfortunately, it will not work as you intend. /(?!Dr\.)(\.)/ will still return the periods that belong to "Dr. Sanches" because of the second grouping. The Regex parser will say to itself, "Yep, this '.' isn't 'Dr.'" /((?!Dr).)/ won't work either, though I believe it should.

And what's more, you'll end up looking through all the sentence "ends" anyway. Actionscript doesn't have a "match all," only a match first. You have to set the global flag (or add g to the end of your regex) and call exec until your result object is null.

var string = 'The quick brown fox jumps over the lazy dog. Once upon a time Dr. Sanches, Mr. Parsons and Gov. Mason went to the store. Hello World.';

var regx:RegExp = /(?!Dr\.)(\.)/g;
var result:Object = regx.exec(string);

for (var i = 0; i < 10; i++) { // paranoia
  if (result == null || result.index == 0) break; // again: paranoia
  trace(result.index, result);
  result = regx.exec(string);
}

// trace results:    
//43 .,.
//64 .,.
//77 .,.
//94 .,.
//119 .,.
//132 .,.
Sold Out Activist