views:

74

answers:

2

I'm writing a script that will take a filename as an argument, find a word a specific word at the beginning of each line - the word ATOM, in this case - and print the values from specific columns.

$FILE=*.pdb *

if test $# -lt 1
then
 echo "usage: $0 Enter a .PDB filename"
 exit
fi
if test -r $FILE
then
 grep ^ATOM $FILE | awk '{ print $18 }' | awk '{ print NR $4, "\t" $38,}'
else
 echo "usage: $FILE must be readable"
 exit
fi

I'm having trouble figuring out three problems:

  1. How to use awk to print only lines that contain ATOM as the first word
  2. How to use awk to print only certain columns from the rows that match the above criteria, specifically columns 2-20 and 38-40
  3. How can I indicate this must be a pdb file? *.pdb *
+4  A: 
  1. That would be

    awk '$1 == "ATOM"' $FILE
    
  2. That task is probably better accomplished with cut:

    grep ^ATOM $FILE | cut -c 2-20,38-40
    
  3. If you want to ensure that the filename passed as the first argument to your script ends with .pdb: first, please don't (file extensions don't really matter in UNIX), and secondly, if you must, here's one way:

    "${1%%.pdb}" == "$1" && echo "usage:..." && exit 1
    

    This takes the first command-line argument ($1), strips the suffix .pdb if it exists, and then compares it to the original command-line argument. If they match, it didn't have the suffix, so the program prints a usage message and exits with status code 1.

David Zaslavsky
Thanks David! Can I ask why you say 'please don't' for the argument not be restricted to only .pdb files? If I need the printed columns to only be the type that have entries in columns 18-30 should I pipe each separately? grep ^ATOM $1 | cut -f 18-30 | cut -f 2-20, 38-40
Koala
@Koala: For the filename thing, what if you want to use your program on a file whose name ends with `.txt`? Or `.csv`? Or `.bak`? Or a file that has a name with no extension at all? Doesn't it seem kind of silly to make the program fail just because the filename doesn't conform to some arbitrary convention? Of course, it's your program, so you can make it check the filename if you want, but if my experience is any guide, there will eventually come a time when you'll want to get rid of that check. Other UNIX utilities (e.g. `grep` and `awk`) don't check filenames; there's a reason for that.
David Zaslavsky
As far as the second part of your question, about the columns, I don't really understand what you're asking.
David Zaslavsky
Clarification of 2nd part of question: if there's content in columns 18-30 THEN the output will display the contents of columns 2-20, 38-40. How do I filter this? Pipe or if then statement? Not sure how to set this up.
Koala
David Zaslavsky
+1  A: 

Contrary to the answer, your task can be accomplished with just one awk command. No need grep or cut or ...

if [ $# -lt 1 ];then
 echo "usage: $0 Enter a .PDB filename"
 exit
fi
FILE="$1"
case "$FILE" in
*.pdb )

if test -r $FILE
then 
 # do for 2-20 assuming whites paces as column separators
 awk '$1=="ATOM" && NF>18 { 
   printf "%s ",$2
   for(i=3;i<=19;i++){
     printf "%s ",$i
   }
   printf "%s",$20   
 }' "$FILE"
else
 echo "usage: $FILE must be readable"
 exit
fi
;;
*) exit;;
esac
ghostdog74
That works awesome! Thank you ghostdog.
Koala