ansaurus

Question

UNIX Script for a list of strings find those not in any file

Answer 1

+2 A:

check the return code of grep.

You can do this by inspecting the $? variable.

if it is 0 then the string was found, otherwise the string was not found. If not 0 then add that string to a 'not found' array and that should be your list of not found properties.

grep "string" 
if [$? -ne 0] 
then 
   string not found 
fi

Brian Bay 2010-03-03 15:08:29

This sound like it could be the answer can you give me an example of how you inspect this?

Jamie Duncan 2010-03-03 15:23:41

The $? variable holds the exit code of the last executed command.Something like this:grep "string"if [$? -ne 0]then string not foundfi

Brian Bay 2010-03-03 15:33:26

This is it Thanks!!

Jamie Duncan 2010-03-03 16:25:55

Everytime you needlessly evaluate $?, a kitten dies. Just do: if grep ...; then ... fi

William Pursell 2010-03-06 15:28:07

Needs more spaces: `if [ $? -ne 0 ]`. And I agree with William: just directly use the result in a conditional; there's few situations in which you *need* `$?`, and this does not appear to be one of them.

ephemient 2010-03-10 05:24:25

Answer 2

+1 A:

Using xyz | while read PROP instead of for PROP in ``xyz``; do for those cases when xyz can get arbitrarily large

Using grep -l ... >/dev/null || xyz to execute xyz if grep fails to match, and discard the grep output do /dev/null without executing xyz if one match is found (-l stops grep after the first match, if any, making it more efficient)

FILE=$1 
TARGETROOT=$2


grep '^[A-Z]*=' "$FILE2" | awk -F= '{print$1}' | while read PROP ; do
  find "$TARGETROOT" -type f | while read FILE2 ; do
    grep -l "^${PROP}=" "$FILE2" >/dev/null || {
      echo "Propery $PROP missing from $FILE2"
    }
  done
done

If dealing with a large number of properties and/or files under $TARGETROOT you can use the following, more efficient approach (which opens and scans each file only once instead of the previous solution's N times, where N was the number of properties in $FILE):

Using a temporary file with all sorted properties from $FILE to avoid duplicating work
Using awk ... | sort -u to isolate all sorted properties appearing in another file $FILE2

Using comm -23 "$PROPSFILE" - to isolate those lines (properties) which only appear in $PROPSFILE and not on the standard input (i.e. in $FILE2)

FILE=$1 
TARGETROOT=$2


PROPSFILE="/tmp/~props.$$"
grep '^[A-Z]*=' "$FILE" | awk -F= '{print$1}' | sort -u >"$PROPSFILE"


find "$TARGETROOT" -type f | while read FILE2 ; do
  grep '^[A-Z]*=' "$FILE2" | awk -F= '{print$1}' | sort -u |
  comm -23 "$PROPSFILE" - | while read PROP ; do
    echo "Propery $PROP missing from $FILE2"
  done
done


rm -f "$PROPSFILE"

Cheers, V.

vladr 2010-03-10 03:26:16

sometimes, `grep+awk` can be used for big files. `grep` to search for pattern and `awk` to process. `grep`'s searching algorithm is better. In your `sed` script, you are doing substitution, which is expensive operation. splitting on fields with `awk` is much faster.

ghostdog74 2010-03-10 03:37:05

vladr 2010-03-10 04:12:09

you can even combine `sort -u | comm -23 ....|while..` into `awk` ;). 3 less pipe processes.

ghostdog74 2010-03-10 04:19:06

That would probably be overdoing it. :) Now that we got rid of sed and that we can assume megabytes of data ;) I would not start doing `sort` and `comm`'s jobs inside `awk` hashes -- for one `sort` and `comm` will scale beautifully on a multicore machine, much much better than a single `awk` will.

vladr 2010-03-10 04:27:02

well, i don't know where you get the idea that awk will not "scale beautifully" on multicore machine, but doing all inside one awk process is definitely more efficient. Not to forget that shell `while read loop` which is a major slow poke when data to iterate is large. :). Anyway, that's going OT, and +1 for actually doing something in full scale..:)

ghostdog74 2010-03-10 05:03:37

I get the idea from years of experience with this kind of stuff on enterprise-grade machines etc. :) Go on a 4-core machine and you'll see `awk` top out at 25% with 75% of the machine idle. Spawn a `sort|comm` pipe and see two cores being fully utilized now. Also, after `awk` has eaten all your RAM with millions of distinct rows, how are you going to diff the two hashes in non-O(N^2) without sorting their keys (i.e. not reimplementing C-coded `sort` and `comm` in `awk`)? `| while read` is there only for convenience if the OP needs to do anything other than printing (otherwise `| awk/sed`)

vladr 2010-03-10 05:18:39

ansaurus

tags:

views:

answers:

UNIX Script for a list of strings find those not in any file

related questions