views:

1342

answers:

6

I am interested into getting into bash scripting and would like to know how you can traverse a unix directory and log the path to the file you are currently looking at if it matches a regex criteria.

It would go like this:

  • Traverse a large unix directory path file/folder structure.
  • If the current file's contents contained a string that matched one or more regex expressions,
  • Then append the file's full path to a results text file.

Bash or Perl scripts are fine, although I would prefer how you would do this using a bash script with grep, awk, etc commands.

+5  A: 

use find and grep

find . -exec grep -l -e 'myregex' {} \; >> outfile.txt

-l on the grep gets just the file name

-e on the grep specifies a regex

{} places each file found by the find command on the end of the grep command

>> outfile.txt appends to the text file

Xetius
-exec grep will be rather slow, as it runs grep for every file separately
depesz
You can speed it up considerably with a plus sign: find . -exec grep -l -e 'myregex' {} + >> outfile.txt
Dennis Williamson
+2  A: 

grep -l -R <regex> <location> should do the job.

pgs
-R has given me problems before on certain versions of Unix. If it works for @Brock Woolf then great, otherwise, he will need to use find and grep together
Xetius
It's a GNU grep option. If his unix version doesn't have it, it's not too hard to install (politics aside).
pgs
+7  A: 
find . -type f -print0 | xargs -0 grep -l -E 'some_regexp' > /tmp/list.of.files

Important parts:

  • -type f makes the find list only files
  • -print0 prints the files separated not by \n but by \0 - it is here to make sure it will work in case you have files with spaces in their names
  • xargs -0 - splits input on \0, and passes each element as argument to the command you provided (grep in this example)

The cool thing with using xargs is, that if your directory contains really a lot of files, you can speed up the process by paralleling it:

find . -type f -print0 | xargs -0 -P 5 -L 100 grep -l -E 'some_regexp' > /tmp/list.of.files

This will run the grep command in 5 separate copies, each scanning another set of up to 100 files

depesz
I like this. I love the fact that you can make commands as simple or complicated as you want depending on the amount of power you want to use.
Xetius
A: 
find /path -type f -name "*.txt" | awk '
{
    while((getline line<$0)>0){
        if(line ~ /pattern/){
            print $0":"line
            #do some other things here
        }
    }    
}'

similar thread

ghostdog74
A: 
find /path -type f -name "outfile.txt" | awk '
{
    while((getline line<$0)>0){
        if(line ~ /pattern/){
            print $0":"line
        }
    }    
}'
joe
+1  A: 

If you wanted to do this from within Perl, you can take the find commands that people suggested and turn them into a Perl script with find2perl:

If you have:

$ find ...

make that

$ find2perl ...

That outputs a Perl program that does the same thing. From there, if you need to do something that easy in Perl but hard in shell, you just extend the Perl program.

brian d foy