views:

653

answers:

3

I have a file 'records.txt' which contains over 200,000 records.

Each record is on a separate line and has multiple fields separated by a delimiter '|'.

Each row should have 35 fields, but the problem is one of these rows has <>35 fields, i.e. <>35 '|' characters.

Can someone please suggest a way in Unix, by which I can identify the row. (Like getting count of '|' characters in each row in the file)

A: 

This small perl script should do it:

cat records.txt | perl -ne '$t = $_; $t =~ s/[^\|]//g; print unless length($t) == 35;'

This works by removing all the characters except the |, then counting what is left.

Greg Hewgill
Useless use of cat detected here...
Keltia
+6  A: 

Try this:

awk -F '|'  'NF != 35 {print NR, $0} ' your_filefile
Martin Wickman
+1, u beat me by 24secs :)
Johannes Schaub - litb
Heh. I love this :)
Martin Wickman
A: 

Greg's way with bash stuff, for the bash friends out there :)

while read n; do [ `echo $n | tr -cd '|' | wc -c` != 35 ] && echo $n; done < records.txt
Johannes Schaub - litb
I just wanted to find out a row which has more than N(35 here) separators. Greg and your's, both codes work. Thanks :)
Mohit Nanda