tags:

views:

119

answers:

2

I have many lines containing the names of US Presidents Carter, Bush, Clinton, Obama. Some contain 1 of those names, some 2, some 3, some all 4 of them (in any order)

I know how to search for Carter AND Clinton AND Obama -> :g/.*Carter\&.*Clinton\&.*Obama/p ; I know how to search for Carter AND (Clinton OR Bush) -> :g/.*Carter\&(.*Clinton\|.*Bush)/p

(there are most certainly better ways to do that)

But I can't figure how to search (and I looked at the related questions), e.g., for Bush AND Clinton NOT Carter and even less how to search, e.g., for Bush AND Clinton NOT (Carter OR Obama)

Thanks in advance

THG

+3  A: 

If you want to use Perl-style regular expressions after vim, forget about \&: it is a vim-specific feature which is useless since vim also has lookaheads, so any r1\&r2 can be rewritten as as \%(r1\)\@=r2. But lookaheads are better as there is a negative version of it and they are also available in most of Perl-style regular expression engines. Your (Bush AND Clinton AND NOT (Carter OR Obama)) can be expressed in the following way:

g/^\%(.*\%(Carter\|Obama\)\)\@!\%(.*Bush\)\@=.*Clinton/

Or, with very magic:

g/^\v%(.*%(Carter|Obama))@!%(.*Bush)@=.*Clinton/

See :h /\@=

About inner logic: look-ahead is like branches: for regex (reg1)@=reg2 assuming that reg2 matches at position N (match starts at position N), regex engine checks whether reg1 also matches at this position. If it does not, then the position is discarded and regex engine tries next possible match for reg2. Same for the negative look-ahead, but with the difference that regex engine discards the position if reg1 does match.


Example:

Regex: (.b)@!a.

String: aba.

  1. Found match: a matches at position 0 (aba). Trying to match look-ahead: . matches a (aba) and b matches b (aba), look-ahead matches, discarding position.
  2. Position 1 (aba) does not match a.
  3. Found match: a matches at position 2 (aba). Trying to match look-ahead: . matches a (aba), but b does not match: no symbols left, look-ahead fails. Result: regex matches at position 2.
ZyX
ZyX : First of all, thanks : I tried it and it worked (of course). The problem left is that I do not understand the inner logic of your regex. This being a forum and not a classroom, I shall follow your advice and look around Vim help (if it is not too hard for a regex newbie
ThG
@ThG You should mark answers as accepted when you accept them. You have 0% accept rate now, this will prevent someone from answering new questions.
ZyX
+5  A: 

To represent a NOT, use the negative assertion \@!.

For example, "NOT Bush" would be:

^\(.*Bush\)\@!

or using \v:

\v^(.*Bush)@!

Important: note the leading ^. While it's optional if you only use positive assertions (one match is as good as any other), it is required to anchor negative assertions (otherwise they can still match at the end of a line).

Translating "Bush AND Clinton AND NOT (Carter OR Obama)":

\v^(.*Bush)&(.*Clinton)&(.*Carter|.*Obama)@!

Addendum

To explain the relationship between \& and \@=:

One&Two&Three

is interchangeable with:

(One)@=(Two)@=Three

The only difference is that \& directly mirrors \| (which should be more obvious and natural), while \@= mirrors Perl's (?=pattern).

Piet Delport
Piet Delport : I tried your solution and it worked perfectly. Thanks a lot. But may I say, as a newbie, that I am amazed by the number of different ways Vim has to achieve things successfully (here your solution and ZyX's)?
ThG
Herbert Sitz
Piet Delport
@Piet Delport I advised to avoid because 1) You could get used to it and have problems with understanding or writing Perl-style regular expressions. 2) There is no negative branches, but there is negative look-ahead. Why do we need two different things to express one idea?
ZyX