tags:

views:

58

answers:

1

when I try to sort the following text file 'input':

test1 3   
test3 2
test 4

with the command

sort input

the output is exactly the input. Here is the output of

od -bc input

:

0000000 164 145 163 164 061 011 063 012 164 145 163 164 063 011 062 012
          t   e   s   t   1  \t   3  \n   t   e   s   t   3  \t   2  \n
0000020 164 145 163 164 011 064 012
          t   e   s   t  \t   4  \n
0000027

It's just a tab separated file with two columns. When I do

sort -k 2

The output changes to

test3 2
test1 3
test 4

which is what I would expect. But if I do

sort -k 1

nothing changes with respect to the input, whereas I would expect 'test' to sort before 'test1'. Finally, if I do

cat input | cut -f 1 | sort

I get

test
test1
test3

as expected. Is there a logical explanation for this? What exactly is sort supposed to do by default, something like:

sort -k 1

?

My version of sort:

sort (GNU coreutils) 7.4
+3  A: 

From the man pages:

* WARNING * The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

So it seems export LC_ALL=C must help

Aleksey Otrubennikov
GNU sort with LC_ALL=C does produce the traditional answer - which is what 'sort' on Solaris produces anyway. Change the 'test3' line to 'Test3' and you get more differences. The GNU answers are consistent with the sort order of '`ls`'. It is surprising, though.
Jonathan Leffler
Thanks, for me, too, it produces the expected result. However, in my default locale en_US.UTF-8, both tab and space also sort before alhpanumeric characters. If sort is just doing a lexicographical sort on the entire line, it remains a little surprising for me, too.
+1 This works. But .... why???
Niels Basjes