ansaurus

Question

AWK: is there some flag to ignore comments?

Answer 1

A:

I would remove them with sed first, then remove blank lines with grep.

sed 's/#.*//' < coriolis_data | egrep -v '^$' | awk ...

nsayer 2010-04-21 22:36:28

My point was to avoid using sed things like: sed -e 's@^#.*$@@g' -e /^$/d coriolis_data | awk ...

HH 2010-04-21 22:41:32

I don't think awk has automatic comment removal. For one, there are multiple syntaxes for specifying comments. Awk is too generalized a tool to have built-in support for a specific one.

nsayer 2010-04-21 22:44:24

Answer 2

+2 A:

Just decrement NR yourself on comment lines:

 awk '/^[[:space:]]*#/ { NR-- } {sum+=$3} END { ... }' coriolis_data

Okay, that did answer the question you asked, but the question you really meant:

 awk '{ if ($0 ~ /^[[:space:]]*#/) {NR--} else {sum+=$3} END { ... }' coriolis_data

(It's more awk-ish to use patterns outside the blocks as in the first answer, but to do it that way, you'd have to write your comment pattern twice.)

Edit: Will suggests in the comments using /.../ {NR--; next} to avoid having the if-else block. My thought is that this looks cleaner when you have more complex actions for the matching records, but doesn't matter too much for something this simple. Take your favorite!

Jefromi 2010-04-21 22:45:07

Not safe enough since $3 could be added on a commentary line.

Bruno Brant 2010-04-21 22:46:53

One issue there is that you're going to add $3 to the sum even on comment lines, aren't you?

nsayer 2010-04-21 22:46:54

@Bruno Jinx! :)

nsayer 2010-04-21 22:47:37

Err with an ending "}", fixed: awk '{ if ($0 ~ /^[[:space:]]*#/ ) {NR--} else {sum+=$3}} END {ave=sum/NR} END {print ave }' coriolis_data

HH 2010-04-21 23:03:36

Change the first rule to: /^[[:space:]]*#/ { NR--; next; } The 'next;' skips to the next record and ignores the rest of the code. Easier than using the IFs, IMHO.

Will Hartung 2010-04-21 23:43:05

Answer 3

+1 A:

The file that you provide for AWK to parse is not a source file, it's data, therefore, AWK knows nothing about its configuration. In other words, for AWK, lines beginning with # are nothing special.

That said, of course you can skip comments, but you will have to create a logic for that: Just tell AWK to ignore everything that comes after a "#" and count yourself the number of lines.

awk 'BEGIN {lines=0} {if(substr($1, 0, 1) != "#") {sum+=$3; lines++} } END {avg=sum/lines} END {print avg}' coriolis_data

You can, of course, indent it for better readability.

Bruno Brant 2010-04-21 22:46:07

I think piping it through sed first is more readable, fwiw.

nsayer 2010-04-21 22:50:09

If you can have comments beginning on the middle of a line, you need to add some code to that one-liner. Just shout here and I will provide it for ya.

Bruno Brant 2010-04-21 22:50:14

Better to use a regex to check for comment lines, and you can still modify NR yourself instead of keeping your own line counter.

Jefromi 2010-04-21 22:51:33

@nsayer: True. However, there might be more structural bits that the user might want to take into consideration when parsing. Keeping then on a single place, a AWK script file, would be better. Besides, if the files are big, there might be some performance issues when breaking down the solution in many tools, wouldn't it? (I'm not sure)

Bruno Brant 2010-04-21 22:52:39

@Jefromi: I'd use a regex as well, but I didn't want to make the solution too complex. I don't see the real benefit of altering the value of NR, since you are changing the meaning of the variable... If the user needs to expand the snippet later, it might generate confusion.

Bruno Brant 2010-04-21 22:54:43

@Bruno: It'd take some pretty substantial expansion, but I suppose you could manage to mess yourself up. I'd probably count comments instead of lines, because I'm obsessive and there are probably fewer comments than lines.

Jefromi 2010-04-21 23:01:02

@Jefromi: I like the idea of counting comments. Just had to increment the counter on the else of the code above.

Bruno Brant 2010-04-22 16:34:41

Answer 4

A:

There is a SIMPLER way to do it!

$ awk '!/#/ {print $0}' coriolis_data
.105 0.005 0.9766 0.0001 0.595 0.005
.095 0.005 0.9963 0.0001 0.595 0.005
.115 0.005 0.9687 0.0001 0.595 0.005
.105 0.005 0.9693 0.0001 0.595 0.005
.095 0.005 0.9798 0.0001 0.595 0.005
.105 0.005 0.9798 0.0001 0.595 0.005
.095 0.005 0.9711 0.0001 0.595 0.005
.110 0.005 0.9640 0.0001 0.595 0.005
.105 0.005 0.9704 0.0001 0.595 0.005
.090 0.005 0.9644 0.0001 0.595 0.005

Correction: no, it is not!

$ awk '!/#/ {sum+=$3}END{ave=sum/NR}END{print ave}' coriolis_data 
0.885491    // WRONG.
$ awk '{if ($0 ~ /^[[:space:]]*#/){NR--}else{sum+=$3}}END{ave=sum/NR}END{print ave}' coriolis_data
0.97404     // RIGHT.

HH 2010-04-21 23:24:00

Answer 5

+4 A:

it is best not to touch NR , use a different variable for counting the rows. This version skips comments as well as blank lines.

$ awk '!/^[ \t]*#/&&NF{sum+=$3;++d}END{ave=sum/d;print ave}' file
0.97404

ghostdog74 2010-04-22 01:32:05

Answer 6

+1 A:

Another approach is to us a conditional statement...

awk '{ if( $1 != "#" ){ print $0 } }' coriolis_data

What this does is tell awk to skip lines whose first entry is #. Of course this requires the comment charactter # to stand alone at the beginning of a comment.

sprax 2010-06-28 23:18:08

ansaurus

tags:

views:

answers:

AWK: is there some flag to ignore comments?

related questions