tags:

views:

79

answers:

5

I'm trying to write a regular expression that will essentially return true if string A is found and string B is not found.

To be more specific, I'm looking for any file on my server which has the text 'base64_decode' in it, but not 'copyright'.

Thanks!

A: 
Scott Chamberlain
You think incorrectly :)
leppie
Those are character classes. Inverting them doesn't do what you think it does.
Borealid
Tested it - only matches when both copyright and base64_decode are present. Needs to match if only base64_decode is found and not copyright.
Ian Silber
@Borealid is right: you're way off base. `[^copyright]` matches any **one** character, unless it's one of the characters in the word `copyright`.
Alan Moore
A: 

Use negative lookahead and lookbehind:

^(?<!.*copyright.*)(base64_decode)(?!.*copyright.*)$

Perl doesn't support this yet :-P.

Borealid
More to the point, grep doesn't support it. In fact, only two regex flavors I know of support unbounded, variable-length lookbehinds: .NET and JGSoft (or three if you count Java, but that's a bug :/ ). And even in those flavors you're usually better off using lookahead alone.
Alan Moore
@Alan: regarding Java infinite lookbehind, I'd say if you know how to manipulate the bug to make it work for you, you can classify it as a hidden feature =)
polygenelubricants
A: 

It's not recommended to do this in one regex, but if you must, you can use lookaheads:

^(?=.*must-have)(?!.*must-not-have)

You may want to do this in single-line/dot-all mode, and the beginning anchor may be \A instead of ^.

(?=…) is positive lookahead; it asserts that a given pattern can be matched. (?!…) is negative lookahead; it asserts that a given pattern can NOT be matched.

References

polygenelubricants
On rubular: http://www.rubular.com/r/xmIvlRZDtm
polygenelubricants
As far as I know, grep doesn't even support lookaheads. But whatever the case, I agree that this is not the way to go. Two piped grep calls are bound to be much faster than any Frankenregex. :D
Alan Moore
A: 

Piped greps should be able to achieve that easily:

find -type f -print | xargs grep -l "base64_decode" | xargs grep -L "copyright"

iniju
+2  A: 

I'm not sure your real task can be solved purely within the regex passed into grep, since grep processes files line-by-line. I would use the -l (--files-with-matches) and -L (--files-without-match) options along with command substitution backticks, like so:

grep -L copyright `grep -l base64_decode *`

grep -l base64_decode * lists the names of all the files with "base64_decode" in them, and the backticks put that list on the command line after grep -L copyright, which searches those files and lists the subset of them that doesn't contain "copyright".

wdebeaum
Thank you Mr. wdebeaum! It worked.
Ian Silber