views:

399

answers:

4

I have a file which contains filenames (and the full path to them) and I want to search for a word within all of them. some pseudo-code to explain:

grep keyword or cat files.txt > grep keyword cat files txt | grep keyword

the problem is that I can only get grep to search the filenames, not the contents of the actual files

Thanks for reading

A: 

Long time when last created a bash shell script, but you could store the result of the first grep (the one finding all filenames) in an array and iterate over it, issuing even more grep commands.

A good starting point should be the bash scripting guide.

Sascha
+3  A: 
cat files.txt | xargs grep keyword

or

grep keyword `cat files.txt`

should do the trick.

Pitfalls:

  • If files.txt contains file names with spaces, either solution will malfunction, because "This is a filename.txt" will be interpreted as four files, "This", "is", "a", and "filename.txt". A good reason why you shouldn't have spaces in your filenames, ever.

    • There are ways around this, but none of them is trivial. (find ... -print0 / xargs -0 is one of them.)
  • The second (cat) version can result in a very long command line (which might fail when exceeding the limits of your environment). The first (xargs) version handles long input automatically; xargs offers several options to control the details.

DevSolar
xargs will not spawn a seperate process for every line it reads from standard input. xargs will call grep with as many arguments as possible (say ARG_MAX). The number of times grep will be called is ceil(num_files/ARG_MAX).
sigjuice
Correct... I mis-read the xargs manpage in that regard. Edited. (The actual limits of xargs can be determined with "xargs --show-limits".
DevSolar
+2  A: 

Both of the answers from DevSolar work (tested on Linux Ubuntu), but the xargs version is preferable if there may be many files, since it will avoid running into command line length limits.

so:

cat files.txt | xargs grep keyword

is the way to go

Malcolm Box
Added it to my answer, plus a note that xargs invokes a new process for each file.
DevSolar
xargs does not spawn a new process for each argument
pixelbeat
You won the "Useless Use of Cat" Award. :-) http://partmaps.org/era/unix/award.html
sigjuice
+1  A: 
tr '\n' '\0' <files.txt | LANG=C xargs -r0 grep -F keyword
  • tr will delimit names with NUL character so that spaces not significant (note the corresponding -0 option to xargs).
  • xargs -r will start a single grep process for a "large" number of files, but not start any grep process if there are no files.
  • LANG=C means use quick routines for matching, rather than slow locale ones
  • grep -F means use quick string matching rather than slow regular expression matching
pixelbeat
Doesn't the LANG=C thing fail once filenames aren't ASCII-7?
DevSolar
Not for fixed strings no. If you wan't to grep for things like '[:upper:]' then yes.
pixelbeat