




Hi All,

I am trying to create a text file that contains a listing of all log files that contain a certain string in the first line. More specifically, SAS log files.

Currently I have a simple script that will search the entire system for "*.log" files and output the entire list to a text file.

Is there a way to only output the log files that contain a certain string?

Here is the current command:

find `pwd` -name "*.log" > sas_log_list.txt

Every SAS log file contains the same string on the very first line.
This string is:

1 The SAS System

So basically I want to search a file system for log files containing the string above, and output those file names to a text file.

Thanks in advance, Jason

find `pwd` -name "*.log" -exec grep "The SAS System" {} \;


find \`pwd\` -name "\*.log" | grep -i "the sas system"

Unless I'm mistaken, you don't need the call to pwd. I think this will get you what you want. You can use the -l flag on grep to get the filenames rather than the matching lines.

find . -name "*.log" -exec grep -l "The SAS System" {} \; > sas_log_list.txt

I am trying to add the modified date to the output file now also. So that the output will contain the modified date and full path of each log file on each line. Here is what I am trying: find pwd -mtime -2 -name "*.log" -exec grep -l "The SAS System" {} \; > sas_log_list.txt
Maybe something along these lines, then? `find pwd -mtime -2 -name "*.log" -exec grep -l -Z "The SAS System" {} \; | xargs -0 ls -l > sas_log_list.txt`
Could I do that for filed created in the last 30 days? It doesn't seem like i could just change the -mtime to -30, or can I?
I don't see why not.

I've attempted to make things a bit faster by reading only first line of each file. This prints out file names matching pattern.

( IFS=$'\n' ; for f in $(find `pwd` -name "*log" -type f ) ; do 
   head -n 1 "$f" | grep -q "The SAS System" && echo "$f"
done )

UPDATE 1: Edited to handle path names containing white space using one of the techniques offered by Charles Duffy. I couldn't use the find -exec .. + expression as {} can't appear more than once. Thanks ghostdog74 and Telemachus

UPDATE 2: Add full pathname and last modified time

( IFS=$'\n' ; for f in $(find . -name "*log" -type f ) ; do 
   head -n 1 "$f" | grep -q "The SAS System" && echo $(readlink -f "$f") $(stat -c %y "$f")
done )
breaks on files with whitespaces
@ghostdog74 Why not mention how to fix that? @unhillbilly Quote the two occurrences of `$f` in line 2 (to protect against problems due to spaces in file names).
@ghostdog74 Indeed it does fail with paths containing spaces. @Telemachus I'll need more than quotes as the `for` construct will also botch things up. I'll fix it.
I am trying to add the modified date to the output file now also. So that the output will contain the modified date and full path of each log file on each line.
@newbie_dev - see update
+3  A: 

The hardest part of this question is searching only within the first line. The most accurate one liner (broken here for readability) I could come up with was:

find . -name '*.log'  -type f  -readable  ! -size 0 \
       -exec sed -n '1{/The SAS System/q0};q1' {} \; \

Due to the obscure nature of sed syntax, some explanation is in order:

  • The 1{...} will be evaluated for the first line only.
  • The /regex/q0 command will quit with exit code 0 (success) if the regex had been matched (consider /^regex$/ for matching the entire line against that regex).
  • If we didn't quit due to the previous match the next command q1 will quit with exit 1 (fail).

find uses that sed command as a predicate and -print only if it was true. However there is a small snag. Apparently if the file is with -size 0 sed will exit 0 immediately without evaluating its arguments. For that reason we need the ! -size 0 argument to find.

As suggested by @Brandon Horsley, -type f will produce less errors, and while we at it lets verify that the file is -readable as well.

Chen Levy
Nice solution, grepping the whole file is overkill. I would add `-type f` to the find, and would have quoted `*.log` instead of escaping it, but otherwise this is how I would approach the problem.
Brandon Horsley

bash 4

shopt -s globstar
shopt -s nullglob
for logfile in **/*.log

     read firstline<"$logfile"
     case "$firstline" in
       *"The SAS System"*) echo "$logfile" >> sas_log_list.txt

Why the `exec`? Why not `read firstline<"$logfile"`?
Dennis Williamson