ansaurus

Question

Linux: Removing files that don't contain all the words specified

Answer 1

A:

First, remove the file-list:

rm flist

Then, for each of the words, add the file to the filelist if it contains that word:

grep -l WORD * >>flist

Then sort, uniqify and get a count:

sort flist | uniq -c >flist_with_count

All those files in flsit_with_count that don't have the number of words should be deleted. The format will be:

2 file1
7 file2
8 file3
8 file4

If there were 8 words, then file1 and file2 should be deleted. I'll leave the writing/testing of the script to you.

Okay, you convinced me, here's my script:

#!/bin/bash
rm -rf flist
for word in fopen fclose main ; do
    grep -l ${word} *.c >>flist
done
rm $(sort flist | uniq -c | awk '$1 != 3 {print $2} {}')

This removes the files in the directory that didn't have all three words:

paxdiablo 2009-03-05 13:07:57

Answer 2

+17 A:

How about:

grep -L foo *.txt | xargs rm
grep -L bar *.txt | xargs rm

If a file does not contain foo, then the first line will remove it.

If a file does not contain bar, then the second line will remove it.

Only files containing both foo and bar should be left

-L, --files-without-match
     Suppress normal output; instead print the  name  of  each  input
     file from which no output would normally have been printed.  The
     scanning will stop on the first match.

See also @Mykola Golubyev's post for placing in a loop.

toolkit 2009-03-05 13:08:48

i think files with foo OR bar, will be deleted with this .

claferri 2009-03-05 13:11:02

Nope - -L negates the grep.

toolkit 2009-03-05 13:14:17

@toolkit: oups, my bad.

claferri 2009-03-05 13:16:21

I think -v is the correct way to negate the grep, since it wont stop on first match?

jishi 2009-03-05 13:42:51

No, -v will return all LINES that do not include the word.

Judge Maygarden 2009-03-05 14:34:45

Answer 3

A:

This will remove all files that doesn't contain words Ping or Sent

grep -L 'Ping\|Sent' * | xargs rm

Eugene Morozov 2009-03-05 13:10:17

This will not remove files that only contain one of the Words.

x-way 2009-03-05 13:13:25

Yes, I've noticed that already and hit delete, but that was too late.

Eugene Morozov 2009-03-05 13:16:29

You can still delete your answer if you want to.

Jonathan Leffler 2009-03-05 14:55:56

Answer 4

+10 A:

list=`Word1 Word2 Word3 Word4 Word5`
for word in $list
    grep -L $word *.txt | xargs rm
done

Mykola Golubyev 2009-03-05 13:22:15

Answer 5

+5 A:

Addition to the answers above: Use the newline character as delimiter to handle file names with spaces!

grep -L $word $file | xargs -d '\n' rm

soulmerge 2009-03-05 13:28:40

Answer 6

+1 A:

grep -L word | xargs rm

2009-03-05 13:51:51

Answer 7

+1 A:

To do the same matching filenames (not the contents of files as most of the solutions above) you can use the following:

for file in `ls --color=never | grep -ve "\(foo\|bar\)"`
do
   rm $file
done

As per comments:

for file in `ls`

shouldn't be used. The below does the same thing without using the ls

for file in *
do
  if [ x`echo $file | grep -ve "\(test1\|test3\)"` == x ]; then
    rm $file
  fi
done

The -ve reverses the search for the regexp pattern for either foo or bar in the filename. Any further words to be added to the list need to be separated by \| e.g. one\|two\|three

Andy 2009-03-05 14:04:47

For file in 'ls' is a bad idea.

Porges 2009-03-09 14:53:07

Good point. Edited accordingly, but I don't like how that complicates it.Can you think of a more efficient way?

Andy 2009-03-09 17:42:47

Answer 8

A:

You could try something like this but it may break if the patterns contain shell or grep meta characters:

(in this example one two three are the patterns)

for f in *; do
  unset cmd
  for p in one two three; do
    cmd="fgrep \"$p\" \"$f\" && $cmd"
  done
  eval "$cmd" >/dev/null || rm "$f"  
done

radoulov 2009-03-05 15:34:18

ansaurus

tags:

views:

answers:

Linux: Removing files that don't contain all the words specified

related questions