views:

1111

answers:

8

Inside a directory, how can I delete files that lack any of the words specified, so that only files that contain ALL the words are left? I tried to write a simple bash shell script using grep and rm commands, but I got lost. I am totally new to Linux, any help would be appreciated

A: 

First, remove the file-list:

rm flist

Then, for each of the words, add the file to the filelist if it contains that word:

grep -l WORD * >>flist

Then sort, uniqify and get a count:

sort flist | uniq -c >flist_with_count

All those files in flsit_with_count that don't have the number of words should be deleted. The format will be:

2 file1
7 file2
8 file3
8 file4

If there were 8 words, then file1 and file2 should be deleted. I'll leave the writing/testing of the script to you.

Okay, you convinced me, here's my script:

#!/bin/bash
rm -rf flist
for word in fopen fclose main ; do
    grep -l ${word} *.c >>flist
done
rm $(sort flist | uniq -c | awk '$1 != 3 {print $2} {}')

This removes the files in the directory that didn't have all three words:

paxdiablo
+17  A: 

How about:

grep -L foo *.txt | xargs rm
grep -L bar *.txt | xargs rm

If a file does not contain foo, then the first line will remove it.

If a file does not contain bar, then the second line will remove it.

Only files containing both foo and bar should be left

-L, --files-without-match
     Suppress normal output; instead print the  name  of  each  input
     file from which no output would normally have been printed.  The
     scanning will stop on the first match.

See also @Mykola Golubyev's post for placing in a loop.

toolkit
i think files with foo OR bar, will be deleted with this .
claferri
Nope - -L negates the grep.
toolkit
@toolkit: oups, my bad.
claferri
I think -v is the correct way to negate the grep, since it wont stop on first match?
jishi
No, -v will return all LINES that do not include the word.
Judge Maygarden
A: 

This will remove all files that doesn't contain words Ping or Sent

grep -L 'Ping\|Sent' * | xargs rm
Eugene Morozov
This will not remove files that only contain one of the Words.
x-way
Yes, I've noticed that already and hit delete, but that was too late.
Eugene Morozov
You can still delete your answer if you want to.
Jonathan Leffler
+10  A: 
list=`Word1 Word2 Word3 Word4 Word5`
for word in $list
    grep -L $word *.txt | xargs rm
done
Mykola Golubyev
+5  A: 

Addition to the answers above: Use the newline character as delimiter to handle file names with spaces!

grep -L $word $file | xargs -d '\n' rm
soulmerge
+1  A: 

grep -L word | xargs rm

+1  A: 

To do the same matching filenames (not the contents of files as most of the solutions above) you can use the following:

for file in `ls --color=never | grep -ve "\(foo\|bar\)"`
do
   rm $file
done

As per comments:

for file in `ls`

shouldn't be used. The below does the same thing without using the ls

for file in *
do
  if [ x`echo $file | grep -ve "\(test1\|test3\)"` == x ]; then
    rm $file
  fi
done

The -ve reverses the search for the regexp pattern for either foo or bar in the filename. Any further words to be added to the list need to be separated by \| e.g. one\|two\|three

Andy
For file in 'ls' is a bad idea.
Porges
Good point. Edited accordingly, but I don't like how that complicates it.Can you think of a more efficient way?
Andy
A: 

You could try something like this but it may break if the patterns contain shell or grep meta characters:

(in this example one two three are the patterns)

for f in *; do
  unset cmd
  for p in one two three; do
    cmd="fgrep \"$p\" \"$f\" && $cmd"
  done
  eval "$cmd" >/dev/null || rm "$f"  
done
radoulov