tags:

views:

185

answers:

2

Given a text file that is supposed to contain 10 tab-delimited columns (i.e. 9 tabs), I'd like to find all rows that have more than 10 columns (more than 9 tabs). Each row ends with CR-LF.

Assume nothing about the data, field widths, etc, other than the above.

Comments regarding approach, and/or working code would be extremely appreciated. Bonus for printing line numbers of offending lines as well.

Thanks in advance!

EDIT, as pointed out by the commenter (thanks!), you can assume the data doesn't contain tabs or CRLF's.

+3  A: 
awk -F'\t' 'NF>10{print}' <filename>

Or, with line numbers:

awk -F'\t' 'NF>10{print NR; print}' <filename>
danben
Winner, winner, chicken dinner. I think I need to specify `awk -F"\t" 'NF>10{print NR ":"; print}'`but other than that, seems to work. Thanks.
awshepard
Ah, you're right. Without specifying the field separator, this would work only if no field contained a space. Will update my answer.
danben
+4  A: 

Just use a regular expression:

(.*\t){10,}

VeeArr