tags:

views:

159

answers:

1

I have a 600MB tab delimited file that needs to be sorted using only the first two columns. Any lines already in order by those columns should remain in the existing order. My efforts using sort and --key seem to keep sorting the lines by the other columns.

This is an example of the unsorted file:

1244072768  7234 Z
1244072768  7234 e
1244072768  7234 b
1244072768  7234 r
1244072768  7234 a
1244072768  7341 C
1244072768  7341 a
1244072768  7234 -
1244072768  7341 r
1244072768  7341 t
1244072768  7341 -
1244072769  4237 W
1244072769  4237 a
1244072769  4237 s
1244072769  4237 h
1244072769  4237 -

The result should be something like:

1244072768  7234 Z
1244072768  7234 e
1244072768  7234 b
1244072768  7234 r
1244072768  7234 a
1244072768  7234 -
1244072768  7341 C
1244072768  7341 a
1244072768  7341 r
1244072768  7341 t
1244072768  7341 -
1244072769  4237 W
1244072769  4237 a
1244072769  4237 s
1244072769  4237 h
1244072769  4237 -

I have tried sort -k 1n -k 2n test.log;, sort -k 1 -k 2 test.log; and sort -k 1,2 test.log; and all of them sort by the entire line. Is there any way to use standard linux/unix command line tools to sort this file by only the first two columns?

+2  A: 

You should add -s to get a stable sort:

sort -k 1,2 -s test.log

From the man page:

-s, --stable
       stabilize sort by disabling last-resort comparison
sth
Thanks, exactly what I needed, the examples I found never included that flag
Rob