views:

31

answers:

2

I recently came across this crazy script bug on one of my Solaris machines. I found that cut on Solaris skips lines from the files that it processes (or at least very large ones - 800 MB in my case).

> cut -f 1 test.tsv | wc -l
  457030
> gcut -f 1 test.tsv | wc -l
  840571
> cut -f 1 test.tsv > temp_cut_1.txt
> gcut -f 1 test.tsv > temp_gcut_1.txt
> diff temp_cut_1.txt temp_gcut_1.txt | grep '[<]' | wc -l
       0

My question is what the hell is going on with Solaris cut? My solution is updating my scripts to use gcut but... what the hell?

A: 

gcut is almost certainly GNU cut, and the other is probably derived from the original System V cut. Code in the latter might actually go back to original AT&T Unix sources.

The GNU utilities caught on way before they became one of the foundational parts of modern operating systems like Linux and OS X. Way back in the 80s, a lot of sysadmins would install them over the top of the default ones, or put them in a directory in the PATH ahead of the system ones. One of the reasons many sysadmins preferred them is that they often had fewer arbitrary limitations than the "native" utilities. Apparently the cut program on your Solaris box still has some lame limits in it.

I'd test this on a different box running a different OS, to make sure you don't have something else going on, like some weird Unicode deal.

Warren Young
A: 

I have never heard of gcut in Solaris. do a man gcut or gcut --help to see what it is. to play it safe, use cut. Otherwise, you can just use nawk. eg (for what you are trying to do)

nawk '{print $1}END{print "Total count:"NR}'

since getting field 1 and doing wc is the same as counting the lines, just do wc -l < file

ghostdog74