views:

213

answers:

2

I'm parsing a properties file to get a list of properties defiend. I want to check all the places these properties are used (target dir and subdirs), flagging up any that are defined in the properties file but not used anywhere in the targer dir. Thus far I have

FILE=$1
TARGETROOT=$2

for LINE in `grep '[A-Z]*=' $FILE | awk -F '=' '{print$1}'`;
do

done;

Inside this loop I want to find those $LINE vars which are not in $TARGETROOT or its subdirs

Example files

Properties File
a=1
b=2
c=3
...

Many files that contain references to properties via

FILE 1
PropAValue = a
+2  A: 

check the return code of grep.

You can do this by inspecting the $? variable.

if it is 0 then the string was found, otherwise the string was not found. If not 0 then add that string to a 'not found' array and that should be your list of not found properties.

grep "string" 
if [$? -ne 0] 
then 
   string not found 
fi
Brian Bay
This sound like it could be the answer can you give me an example of how you inspect this?
Jamie Duncan
The $? variable holds the exit code of the last executed command.Something like this:grep "string"if [$? -ne 0]then string not foundfi
Brian Bay
This is it Thanks!!
Jamie Duncan
Everytime you needlessly evaluate $?, a kitten dies. Just do: if grep ...; then ... fi
William Pursell
Needs more spaces: `if [ $? -ne 0 ]`. And I agree with William: just directly use the result in a conditional; there's few situations in which you *need* `$?`, and this does not appear to be one of them.
ephemient
+1  A: 
  • Using xyz | while read PROP instead of for PROP in ``xyz``; do for those cases when xyz can get arbitrarily large
  • Using grep -l ... >/dev/null || xyz to execute xyz if grep fails to match, and discard the grep output do /dev/null without executing xyz if one match is found (-l stops grep after the first match, if any, making it more efficient)

    FILE=$1 
    TARGETROOT=$2
    
    
    grep '^[A-Z]*=' "$FILE2" | awk -F= '{print$1}' | while read PROP ; do
      find "$TARGETROOT" -type f | while read FILE2 ; do
        grep -l "^${PROP}=" "$FILE2" >/dev/null || {
          echo "Propery $PROP missing from $FILE2"
        }
      done
    done
    

If dealing with a large number of properties and/or files under $TARGETROOT you can use the following, more efficient approach (which opens and scans each file only once instead of the previous solution's N times, where N was the number of properties in $FILE):

  • Using a temporary file with all sorted properties from $FILE to avoid duplicating work
  • Using awk ... | sort -u to isolate all sorted properties appearing in another file $FILE2
  • Using comm -23 "$PROPSFILE" - to isolate those lines (properties) which only appear in $PROPSFILE and not on the standard input (i.e. in $FILE2)

    FILE=$1 
    TARGETROOT=$2
    
    
    PROPSFILE="/tmp/~props.$$"
    grep '^[A-Z]*=' "$FILE" | awk -F= '{print$1}' | sort -u >"$PROPSFILE"
    
    
    find "$TARGETROOT" -type f | while read FILE2 ; do
      grep '^[A-Z]*=' "$FILE2" | awk -F= '{print$1}' | sort -u |
      comm -23 "$PROPSFILE" - | while read PROP ; do
        echo "Propery $PROP missing from $FILE2"
      done
    done
    
    
    rm -f "$PROPSFILE"
    

Cheers, V.

vladr
sometimes, `grep+awk` can be used for big files. `grep` to search for pattern and `awk` to process. `grep`'s searching algorithm is better. In your `sed` script, you are doing substitution, which is expensive operation. splitting on fields with `awk` is much faster.
ghostdog74
vladr
you can even combine `sort -u | comm -23 ....|while..` into `awk` ;). 3 less pipe processes.
ghostdog74
That would probably be overdoing it. :) Now that we got rid of sed and that we can assume megabytes of data ;) I would not start doing `sort` and `comm`'s jobs inside `awk` hashes -- for one `sort` and `comm` will scale beautifully on a multicore machine, much much better than a single `awk` will.
vladr
well, i don't know where you get the idea that awk will not "scale beautifully" on multicore machine, but doing all inside one awk process is definitely more efficient. Not to forget that shell `while read loop` which is a major slow poke when data to iterate is large. :). Anyway, that's going OT, and +1 for actually doing something in full scale..:)
ghostdog74
I get the idea from years of experience with this kind of stuff on enterprise-grade machines etc. :) Go on a 4-core machine and you'll see `awk` top out at 25% with 75% of the machine idle. Spawn a `sort|comm` pipe and see two cores being fully utilized now. Also, after `awk` has eaten all your RAM with millions of distinct rows, how are you going to diff the two hashes in non-O(N^2) without sorting their keys (i.e. not reimplementing C-coded `sort` and `comm` in `awk`)? `| while read` is there only for convenience if the OP needs to do anything other than printing (otherwise `| awk/sed`)
vladr