views:

170

answers:

2

I'm piping a program's output through some awk commands, and I'm almost where I need to be. The command thus far is:

myprogram | awk '/chk/ { if ( $12 > $13) printf("%s %d\n", $1, $12 - $13); else  printf("%s %d\n", $1, $13 - $12)  }  ' | awk '!x[$0]++'

The last bit is a poor man's uniq, which isn't available on my target. Given the chance the command above produces an output such as this:

GR_CB20-chk_2, 0
GR_CB20-chk_2, 3
GR_CB200-chk_2, 0
GR_CB200-chk_2, 1
GR_HB20-chk_2, 0
GR_HB20-chk_2, 6
GR_HB20-chk_2, 0
GR_HB200-chk_2, 0
GR_MID20-chk_2, 0
GR_MID20-chk_2, 3
GR_MID200-chk_2, 0
GR_MID200-chk_2, 2

What I'd like to have is this:

GR_CB20-chk_2, 3
GR_CB200-chk_2, 1
GR_HB20-chk_2, 6
GR_HB200-chk_2, 0
GR_MID20-chk_2, 3
GR_MID200-chk_2, 2

That is, I'd like to print only line that has a maximum value for a given tag (the first 'field'). The above example is representative of the at data in that the output will be sorted (as though it had been piped through a sort command).

+1  A: 

If you don't need the items to be in the same order they were output from myprogram, the following works:

... | awk '{ if ($2 > x[$1]) x[$1] = $2 } END { for (k in x) printf "%s %s", k, x[k] }'
jamessan
Nicely done. Thanks a bunch.
Jamie
Just missing the "\n" in the printf(). Otherwise perfect.
Jamie
Ah, right. I had that in my test version but forget it when typing it here. :)
jamessan
+2  A: 

Based on my answer to a similar need, this script keeps things in order and doesn't accumulate a big array. It prints the line with the highest value from each group.

#!/usr/bin/awk -f
{
    s = substr($0, 0, match($0, /,[^,]*$/))
    if (s != prevs) {
        if ( FNR > 1 ) print prevline
        prevval = $2
        prevline = $0
    }
    else if ( $2 > prevval ) {
        prevval = $2
        prevline = $0
    }
    prevs = s
}
END {
    print prevline
}
Dennis Williamson
Much faster, and because memory is a concern (it's an embedded environment) I'm switching to this as the correct answer.
Jamie