tags:

views:

246

answers:

6

Comment rows are counted in the NR.

  1. Is there some flag to ignore comments?
  2. How can you limit the range in AWK, not like piping | sed -e '1d', to ignore comment rows?

Example

$ awk '{sum+=$3} END {avg=sum/NR} END {print avg}' coriolis_data
0.885491                          // WRONG divided by 11, should be by 10
$ cat coriolis_data 
#d-err-t-err-d2-err
.105    0.005   0.9766  0.0001  0.595   0.005
.095    0.005   0.9963  0.0001  0.595   0.005
.115    0.005   0.9687  0.0001  0.595   0.005
.105    0.005   0.9693  0.0001  0.595   0.005
.095    0.005   0.9798  0.0001  0.595   0.005
.105    0.005   0.9798  0.0001  0.595   0.005
.095    0.005   0.9711  0.0001  0.595   0.005
.110    0.005   0.9640  0.0001  0.595   0.005
.105    0.005   0.9704  0.0001  0.595   0.005
.090    0.005   0.9644  0.0001  0.595   0.005
A: 

I would remove them with sed first, then remove blank lines with grep.

sed 's/#.*//' < coriolis_data | egrep -v '^$' | awk ...

nsayer
My point was to avoid using sed things like: sed -e 's@^#.*$@@g' -e /^$/d coriolis_data | awk ...
HH
I don't think awk has automatic comment removal. For one, there are multiple syntaxes for specifying comments. Awk is too generalized a tool to have built-in support for a specific one.
nsayer
+2  A: 

Just decrement NR yourself on comment lines:

 awk '/^[[:space:]]*#/ { NR-- } {sum+=$3} END { ... }' coriolis_data

Okay, that did answer the question you asked, but the question you really meant:

 awk '{ if ($0 ~ /^[[:space:]]*#/) {NR--} else {sum+=$3} END { ... }' coriolis_data

(It's more awk-ish to use patterns outside the blocks as in the first answer, but to do it that way, you'd have to write your comment pattern twice.)

Edit: Will suggests in the comments using /.../ {NR--; next} to avoid having the if-else block. My thought is that this looks cleaner when you have more complex actions for the matching records, but doesn't matter too much for something this simple. Take your favorite!

Jefromi
Not safe enough since $3 could be added on a commentary line.
Bruno Brant
One issue there is that you're going to add $3 to the sum even on comment lines, aren't you?
nsayer
@Bruno Jinx! :)
nsayer
Err with an ending "}", fixed: awk '{ if ($0 ~ /^[[:space:]]*#/ ) {NR--} else {sum+=$3}} END {ave=sum/NR} END {print ave }' coriolis_data
HH
Change the first rule to: /^[[:space:]]*#/ { NR--; next; } The 'next;' skips to the next record and ignores the rest of the code. Easier than using the IFs, IMHO.
Will Hartung
+1  A: 

The file that you provide for AWK to parse is not a source file, it's data, therefore, AWK knows nothing about its configuration. In other words, for AWK, lines beginning with # are nothing special.

That said, of course you can skip comments, but you will have to create a logic for that: Just tell AWK to ignore everything that comes after a "#" and count yourself the number of lines.

awk 'BEGIN {lines=0} {if(substr($1, 0, 1) != "#") {sum+=$3; lines++} } END {avg=sum/lines} END {print avg}' coriolis_data

You can, of course, indent it for better readability.

Bruno Brant
I think piping it through sed first is more readable, fwiw.
nsayer
If you can have comments beginning on the middle of a line, you need to add some code to that one-liner. Just shout here and I will provide it for ya.
Bruno Brant
Better to use a regex to check for comment lines, and you can still modify NR yourself instead of keeping your own line counter.
Jefromi
@nsayer: True. However, there might be more structural bits that the user might want to take into consideration when parsing. Keeping then on a single place, a AWK script file, would be better. Besides, if the files are big, there might be some performance issues when breaking down the solution in many tools, wouldn't it? (I'm not sure)
Bruno Brant
@Jefromi: I'd use a regex as well, but I didn't want to make the solution too complex. I don't see the real benefit of altering the value of NR, since you are changing the meaning of the variable... If the user needs to expand the snippet later, it might generate confusion.
Bruno Brant
@Bruno: It'd take some pretty substantial expansion, but I suppose you could manage to mess yourself up. I'd probably count comments instead of lines, because I'm obsessive and there are probably fewer comments than lines.
Jefromi
@Jefromi: I like the idea of counting comments. Just had to increment the counter on the else of the code above.
Bruno Brant
A: 

There is a SIMPLER way to do it!

$ awk '!/#/ {print $0}' coriolis_data
.105 0.005 0.9766 0.0001 0.595 0.005
.095 0.005 0.9963 0.0001 0.595 0.005
.115 0.005 0.9687 0.0001 0.595 0.005
.105 0.005 0.9693 0.0001 0.595 0.005
.095 0.005 0.9798 0.0001 0.595 0.005
.105 0.005 0.9798 0.0001 0.595 0.005
.095 0.005 0.9711 0.0001 0.595 0.005
.110 0.005 0.9640 0.0001 0.595 0.005
.105 0.005 0.9704 0.0001 0.595 0.005
.090 0.005 0.9644 0.0001 0.595 0.005

Correction: no, it is not!

$ awk '!/#/ {sum+=$3}END{ave=sum/NR}END{print ave}' coriolis_data 
0.885491    // WRONG.
$ awk '{if ($0 ~ /^[[:space:]]*#/){NR--}else{sum+=$3}}END{ave=sum/NR}END{print ave}' coriolis_data
0.97404     // RIGHT.
HH
+4  A: 

it is best not to touch NR , use a different variable for counting the rows. This version skips comments as well as blank lines.

$ awk '!/^[ \t]*#/&&NF{sum+=$3;++d}END{ave=sum/d;print ave}' file
0.97404
ghostdog74
+1  A: 

Another approach is to us a conditional statement...

awk '{ if( $1 != "#" ){ print $0 } }' coriolis_data

What this does is tell awk to skip lines whose first entry is #. Of course this requires the comment charactter # to stand alone at the beginning of a comment.

sprax