tags:

views:

33

answers:

4

Is there a way to filter lines with awk using the column (not field) number? I want to grab all the lines in a text file containing the value of field 6 which is assigned to a variable. I am using:

awk -v temp=${het} '{if $6 == temp} print $0}'  

But I have noticed that very occasionally field 5 is blank which messes things up. What I really need is

if colx-y == temp  

but this doesn't appear to exist. Is there a way to do this

the input format is as described below and I have just found another variation I have to deal with. I want to extract (in this case) the 602. The fifth field may or may not exist and may also run into the 6th (both examples below). The file format has columns 23-26 containing the 6th field - gawk sounds like it might be the better option:

HETATM 5307  S   MOY A 602      14.660  14.666 109.556  1.00 26.41           S  
HETATM 5307  S   MOY   602      14.660  14.666 109.556  1.00 26.41           S  
HETATM 5307  S   MOY A1602      14.660  14.666 109.556  1.00 26.41           S     
A: 

why dont you use if else?

like below algo:

if $5 is not blank
{ 
if $6==temp print $0
}
else if $7==temp print $0

It would also be more easy to understand if you provide some sample input!

Vijay Sarathi
Sorry!A typical input line is HETATM 5307 S MOY A 602 14.660 14.666 109.556 1.00 26.41 S and from time to time the A (or whatever character is in this position) gets left out. The format is defined by column number so this would be less prone to error
Chris
A: 
awk -F"[ ]" -v temp=${het} '$6==temp' file
ghostdog74
this has getting beyond the reaches of my awk knowledge and I don't think I'm seeing the full meaning of the "[ ]" field separator - could you explain that one?
Chris
please see schot's answer :)
ghostdog74
+1  A: 

Please add the sample input to your question, not to a comment. It is still not clear how your input looks like. Given your 'normal' input line:

HETATM 5307 S MOY A 602 14.660 14.666 109.556 1.00 26.41 S  

Which of the following two matches your input with 'field 5 is blank':

HETATM 5307 S MOY  602 14.660 14.666 109.556 1.00 26.41 S  
HETATM 5307 S MOY   602 14.660 14.666 109.556 1.00 26.41 S  

In the first case, ghostdog74's answer should work. The -F"[ ]" he uses is a clever way of splitting on single spaces only. -F" " does not work, because then awk uses its default whitespace splitting.

If your data is of the second format, I would use substr() to extract the correct field:

 awk -v temp=${het} 'substr($0, 20, 3) == temp'

Another option could be using gawk's fixed-width splitting, but it really depends on the exact format of your input.

schot
Thanks everyone for your help. The gawk pointer is a big help and I'll give the fieldwidth man page entry a look and use that. Should be OK from here
Chris
A: 

Based on schot's suggestion and your example data:

awk -v FIELDWIDTHS="6 1 4 2 1 3 3 1 1 1 3" '{print $11}'

The final "3" in FIELDWIDTHS represents the field that contains "602". I've omitted field widths for the rest of the line. Some of the field widths could be combined, but I didn't know what was whitespace as delimiters versus whitespace as field contents.

Dennis Williamson