tags:

views:

32

answers:

4

Hi, I'd like to count lines in a file that consists of several blocks, say 3, each with a different number of lines. Each block is separated by a blank line. Is there a one line solution? So far here is what I have:

awk '(NR>4) && NF!=0 {++count} END {print count}' filename > outfile

This obviously counts all non-blank lines (and gets rid of a 4-line header). I now have to include a for loop and after each run it should print the number of lines.

So if I have 100 non-blank lines, and the first block contains 20 lines, the second 50 and the third 30 lines, the ideal output would be 20 50 30

All my effort so far had syntax errors.

Thanks for your help Tom

+1  A: 
awk 'NR>4 {if ($0 ~ /./ ) { mylines=mylines+1 } else { printf("%d ",mylines) ; mylines=0 } }
      END { if ($0 ~ /./) { printf("%d ",mylines) } }' <FILENAME>

would do it.

Note: I'm using printf because you had specified the output as "20 50 30" which is on one line.

Edit: just recognized, we must skip the first 4 lines.

Zsolt Botykai
Thanks for the printf hint, now I am going back to known c-terrain
Tom
Yes this works now correctly
Tom
So you can use the accept button ;-)
Zsolt Botykai
I would like to give you and ghostdog74 the accept button, but apparently only one is allowed. Both of you helped very quickly
Tom
A: 

if i get you correctly (next time show examples)

$ cat file
#Surface 0 of 1 surfaces

# Contour 0, label:    0.138
 462  370.107  0.137889
 461.82  370  0.137889
skipping lines
 463  370.529  0.137889
 462  370.107  0.137889

 570  448.082  0.137889
 569.772  448  0.137889
skipping lines
 571  448.272  0.137889
 570  448.082  0.137889

 569  465.332  0.137889
 568.299  465  0.137889
skipping lines
 570  465.554  0.137889
 569  465.332  0.137889

$ awk 'NR==3{ RS=""; FS="\n"}NR>3{print NF}' file
5
5
5

So basically, at record 3 just before reaching record 4, set the record separator to blank and field separator to newlines. This is because we don't want to touch the RS and FS variables for the header lines. AFter the 3rd line, we need to change the RS and FS so that we get desired result. ie, a record ends with a blank line and all fields are separated by newlines "\n", essentially, counting NF will get us the total number of lines in one record.

ghostdog74
This looks so cool an simple. thanks. Except now i don't know where to put the NR>4, Neeraj's method works.Thanks a lot for all the fast response
Tom
if you want to get rid of 4 lines header, use NR>1
ghostdog74
Tom
OK, this means I just need to subtract 1 from the first record by hand afterwards, as there is no blank before the first record. The header goes right into the data
Tom
seriously, i cannot visualize what you are saying. Its better to show examples of your file, and then describe your desired output, like what i have shown.
ghostdog74
Sorry, yes see my answer below, does this make sense? The output has to be31 35 49, but with the above solution, I am getting 32 35 49
Tom
My header is first line = blank, second =comment, third = blank, fourth = comment, then NO blank, hence the last line of the header is being counted for in your solution, at least in my humble understanding
Tom
see my edit. I followed the sample you gave, so the no of header lines is 3 , not 4. adjust accordingly.
ghostdog74
A: 

awk 'BEGIN{count=0}\
        { if(NF==0) {if(NR>4)print count;count=0} \
          else count++ ;}' test.txt
Neeraj
you sure that works?
ghostdog74
Thanks a lot, this is exactly what I was looking for.
Tom
Wait a minute. It first checks for any field in a row. Then only if it had seen more then 4 lines it prints the counter, but what if the first block is 3 lines long? - Sorry just recognized the header thing. Then I would do my updated version.
Zsolt Botykai
So what if there is an empty line right after the header?
Zsolt Botykai
Yes I agree, your updated version seems to work best. THX
Tom
A: 

Here is a version of my file: It starts with a blank line:

#Surface 0 of 1 surfaces

# Contour 0, label:    0.138
 462  370.107  0.137889 
 461.82  370  0.137889 
skipping lines
 463  370.529  0.137889 
 462  370.107  0.137889 

 570  448.082  0.137889 
 569.772  448  0.137889 
skipping lines
 571  448.272  0.137889 
 570  448.082  0.137889 

 569  465.332  0.137889 
 568.299  465  0.137889 
skipping lines
 570  465.554  0.137889 
 569  465.332  0.137889 

Yes there is a blank line at the end

THX

Tom
so in the above, what's the desired result? 4 5 5 ?? or 6 5 5? If you skip 4 lines, you are going to miss the "462 370.107 0.137889" , am i correct?
ghostdog74
In the above the desired result is 5 5 5, sorry the first blank line of the header was not taken. In the above example the header has 3 lines
Tom