views:

838

answers:

5

Is there an easy way using grep (or combine with other standard command line tools) to get a list of all files which contain one pattern yet don't have a second pattern?

In my specific case I want a list of all files that contain the pattern:

override.*commitProperties

yet don't contain:

super.commitProperties

I'm on Windows but use cygwin extensively. I know how to either find all files with the first pattern OR all patterns without the second, but I'm not sure how to combine those queries.

I'd prefer answers that are generic, as I have a feeling plenty of other people could find this type of query useful. It's easy enough for me to take a generic solution and plug in my values. I just included my specific instance to make the explanation easier.

+3  A: 

Try

find . -print0 | xargs -0 grep -l "override.*commitProperties" \
| tr '\012' '\000' | xargs -0 grep -L super.commitProperties

The tr command will convert newlines to ascii null, so that you can use -0 in the second xargs, avoiding all issues with spaces in file names etc.

Test result:

/tmp/test>more 1 2 3 | cat
::::::::::::::
1
::::::::::::::
override.*commitProperties
super.commitProperties
::::::::::::::
2
::::::::::::::
override.*commitProperties
::::::::::::::
3
::::::::::::::
hello world
/tmp/test>find . -print0 | xargs -0 grep -l "override.*commitProperties" | tr '\012' '\000' | xargs -0 grep -L super.commitProperties
./2
/tmp/test>

As noted by Douglas, find+xargs can be be substituted by grep -r.

hlovdal
That's pretty much what I'd have suggested, except that I'd have used rep -r rather than find.
Douglas Leeder
Could you add some detail as to what each command is doing? I get the find and grep, but what's the tr one doing?
Herms
Also, why the find? grep -r handles recursing the path and makes it simpler. Does find make things work better in some way?
Herms
No, grep -r is fine. I think it is more a habit of mine to always use find+xargs, because sometimes you want to restrict what files to search, like for instance "find . -name '*.[ch]' -print0 | ...".
hlovdal
Good point with filtering files. I've also never seen the -print0 option used before. I'll have to keep that in mind.
Herms
+2  A: 

Use a combination of 2 greps and a comm, as follows (patterns are A and B). Please note that a piped grep won't work since the patterns could be on disparate lines.

$ cat a
A
$ cat b
B
$ cat ab
A
B

$ grep -l A * > A.only
$ grep -l B * > B.only  

$ comm -23 A.only B.only 
a

NOTE: comm command prints the lines common to or unique to two files. "-23" prints lines unique to first file - thus suppressing filenames in second file.

DVK
Never ran across the comm command before. I like the grep solutions better for this situation as it doesn't require extra files that I'll forget to delete, but I'm glad you posted this. Always good to learn about new commands.
Herms
I may be wrong but I think that the first solution (with xargs) will execute MANY grep commands - for each of the files you found.While not a problem for minor config file checking, it will be a major resource drain if you ever need to do the same logic on a very large set of files by launching many many grep child processes.
DVK
See my comment on the accepted solution. I don't believe any more than 2 greps will actually be launched.
Herms
+5  A: 
grep -rl "override.*commitProperties" . | xargs grep -L "super.commitProperties"

-l prints the files with a match

-L prints the files without a match

Douglas Leeder
This one and hlovdal's both work, but I like this one a little more just because it's simpler.
Herms
One can cheat by calling "superXcommitProperties" to escape the wrath of the one doing the grep to find the evil code ;-)
Joachim Sauer
evil code? no, no. This is a case of me forgetting to add the super call to my overrides in a couple places, causing weird things to happen. I've already properly beaten myself, now I'm just searching for all the places I screwed up so I can fix it. :)
Herms
Does grep recursively launch itself? As written grep should only be called twice. As long as it doesn't recursively launch itself I think this would only result in 2 greps being launched.
Herms
My bad. Upon actually thinking, of course there's only 2 greps.
DVK
A: 

My solution is similar to Douglas Leeder's, except I don't use xargs:

grep -l 'override.*commitProperties' $(grep -L super.commitProperties *.txt)

The grep -L command product a list of files that DOES NOT contain the pattern super.commitProperties, the grep -l command looks for the override.commitProperties* pattern off that list.

Overall, it is a different way to skin the cat.

Hai Vu
I'm not familiar with with that syntax. What's the $() doing?
Herms
The $( command ) syntax is essentially the back quote syntax: `command`. The shell replaces $(command) or `command)` with its output. You can google "bash command substitution" for more information
Hai Vu
A: 

ack -l --make "override.*commitProperties" | xargs ack -L "super.commitProperties"

I used this thread and was trying to do a recursive lookup. It was taking 25 minutes+ so found out about ack instead. Did it in 5 minutes.

Handy tool ack.

Matt Clarkson