views:

216

answers:

3

I have a file a.csv with contents similar to below

a,b,c
a  ,aa,  a
a b, c, f
a , b, c
a b a b a,a,a
a,a,a
a aa ,a , t

I am trying to sort it by using sort -k1 -t, a.csv But it is giving following results

a,a,a
a  ,aa,  a
a aa ,a , t
a b a b a,a,a
a , b, c
a,b,c
a b, c, f

Which is not the actual sort on 1st column. What am I doing wrong?

+2  A: 

You have to specify the end position to be 1, too:

sort -k1,1 -t, a.csv
Eemeli Kantola
Oh. k1,1 seems for number of columns to be sorted. But will this work if words are even more worse.
DKSRathore
+2  A: 

Give this a try: sort -t, -k1,1 a.csv

The man suggests that omitting the end field, it will sort on all characters starting at field n until the end of the line:

-k POS1[,POS2]'
     The recommended, POSIX, option for specifying a sort field.  The
     field consists of the part of the line between POS1 and POS2 (or
     the end of the line, if POS2 is omitted), _inclusive_.  Fields and
     character positions are numbered starting with 1.  So to sort on
     the second field, you'd use `-k 2,2' See below for more examples.
Yannick M.
+1 for the same answer :)
Eemeli Kantola
+2  A: 

Try this instead:

sort -k 1,1 -t , a.csv

sort reads -k 1 as "sort from first field onwards" -- thus effectively defying the point of passing the argument in the first place.

This is documented in the sort man page and warned about in the Examples section:

Sort numerically on the second field and resolve ties by sorting alphabetically on the third and fourth characters of field five. Use `:' as the field delimiter:

$ sort -t : -k 2,2n -k 5.3,5.4

Note that if you had written -k 2 instead of -k 2,2, sort would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.

badp
What is n in -k 2,2n?
DKSRathore
It's the -n switch used for the example.
badp