views:

383

answers:

4

Hello, normally I would use Python/Perl for this procedure but I find myself (for political reasons) having to pull this off using a bash shell.

I have a large tab delimited file that contains six columns and the second column is integers. I need to shell script a solution that would verify that the file indeed is six columns and that the second column is indeed integers. I am assuming that I would need to use sed/awk here somewhere. Problem is that I'm not that familiar with sed/awk. Any advice would be appreciated.

Many thanks! Lilly

+2  A: 

Well you can directly tell awk what the field delimiter is (the -F option). Inside your awk script you can tell how many fields are present in each record with the NF variable.

Oh, and you can check the second field with a regex. The whole thing might look something like this:

awk < thefile -F\\t '
{ if (NF != 6 || $2 ~ /[^0123456789]/) print "Format error, line " NR; }
'

That's probably close but I need to check the regex because Linux regex syntax variation is so insane. (edited because grrrr)

Pointy
Pointy, you are a saint. (So are you Ignacio). Works great!
Lilly Tooner
+3  A: 

gawk:

BEGIN {
  FS="\t"
}

(NF != 6) || ($2 != int($2)) {
  exit 1
}

Invoke as follows:

if awk -f colcheck.awk somefile
then
  # is valid
else
  # is not valid
fi
Ignacio Vazquez-Abrams
Checking the second field with int() is a good idea.
Pointy
+1  A: 

here's how to do it with awk

awk 'NF!=6||$2+0!=$2{print "error"}' file
ghostdog74
wow my mind is expanding - I've been awking for almost 30 year without knowing you can drop an expression out there where a regex would go
Pointy
A: 

Pure Bash:

infile='column6.dat'
lno=0

while read -a line ; do
  ((lno++))
  if [ ${#line[@]} -ne 6 ] ; then
    echo -e "line $lno has ${#line[@]} elements"
  fi
  if ! [[  ${line[1]} =~ ^[0-9]+$ ]] ; then
    echo -e "line $lno column  2 : not an integer"
  fi
done < "$infile"

Possible output:

line 19 has 5 elements
line 36 column  2 : not an integer
line 38 column  2 : not an integer
line 51 has 3 elements
fgm