I have a large file named CHECKME which is tab delimited. There are 8 columns in each row. Column 4 is integers.
By using Perl or Python, is it possible to verify that each row in CHECKME has 8 columns and that column 4 is an integer?
I have a large file named CHECKME which is tab delimited. There are 8 columns in each row. Column 4 is integers.
By using Perl or Python, is it possible to verify that each row in CHECKME has 8 columns and that column 4 is an integer?
In Python:
def isfileok(filename):
f = open(filename)
for line in f:
pieces = line.split('\t')
if len(pieces) != 8:
return False
if not pieces[3].isdigit():
return False
return True
I assume that by "column no. 4" you mean the 4th one, hence the [3]
since Python (like most computer languages) indices from 0.
Here I'm just returning a boolean result, but I split up the code so it's easy to give good diagnostics about what line is wrong, and how, if you so desire.
In Perl:
while (<>) {
if (! /^[^\t]+\t[^\t]+\t[^\t]+\t\d+\t[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$/) {
die "Line $. is bad: $_";
}
}
Checks to see that the line starts with one or more non-tabs, followed by a tab, followed by one or more non-tabs, followed by a tab, followed by one or more non-tabs, followed by a tab, followed by one or more digits, etc. until the eighth set of non-tab(s), which must be at the end of the line.
Thats the quick and dirty solution, but in the long run, it'd probably be better to use a "split /\t/" and count the number of fields it gets and then check to make sure field 3 (zero origin) is just digits. That way when (not if) the requirements change and you now need 9 fields, with the 9th field being a prime number, it's easy to make the change.
for n,line in enumerate(open("filename")):
line=line.split()
if len(line)!=8:
print "line %d error" % n+1
if not line[3].isdigit():
print "line %d error" % n+1
In Perl
while(<>) {
my @F=split/\t/;
die "Invalid line: $_" if @F!=8 or $F[3]!~/^-?\d+$/;
}
It's very easy work for Perl:
perl -F\\t -ane'die"Invalid!"if@F!=8||$F[3]!~/^-?\d+$/' CHECKME
Read files given on the command-line or stdin. Print invalid lines. Return code is zero if there are no errors or one otherwise.
import fileinput, sys
def error(msg, line):
print >> sys.stderr, "%s:%d:%s\terror: %s" % (
fileinput.filename(), fileinput.filelineno(), line, msg)
error.count += 1
error.count = 0
ncol, icol = 8, 3
for row in (line.split('\t') for line in fileinput.input()):
if len(row) == ncol:
try: int(row[icol])
except ValueError:
error("%dth field '%s' is not integer" % (
(icol + 1), row[icol]), '\t'.join(row))
else:
error('wrong number of columns (want: %d, got: %d)' % (
ncol, len(row)), '\t'.join(row))
sys.exit(error.count != 0)
$ echo 1 2 3 | python validate-input.py *.txt - not_integer.txt:2:a b c 1.1 e f g h error: 4th field '1.1' is not integer wrong_cols.txt:3:a b error: wrong number of columns (want: 8, got: 3) <stdin>:1:1 2 3 error: wrong number of columns (want: 8, got: 1)