ansaurus

Question

linux bash 'sort' in dictionary order - problem with zeros

Answer 1

+4 A:

from the man page

   -b, --ignore-leading-blanks
          ignore leading blanks

   -g, --general-numeric-sort
          compare according to general numerical value

   -n, --numeric-sort
          compare according to string numerical value

ex:

andrey@localhost:~/gamess$ echo -e "1\n2\n10" | sort
1
10
2
andrey@localhost:~/gamess$ echo -e "1\n2\n10" | sort -g
1
2
10

aaa 2010-06-24 04:54:41

True, but how is this relevant? He said at the start that it gives (and he expects) the 1-10-2 order in the one-column case. The difference he's asking about is when there's a second column present. Also he notes that Mac OS X 10.5 (which uses GNU sort) uses the same ordering when two columns are present, but RHEL doesn't.

Ken 2010-06-24 05:20:05

@Ken: I think it's actually that RHEL and Ubuntu use GNU sort and OS X uses a BSD version.

Dennis Williamson 2010-06-24 05:42:12

Ken 2010-06-24 05:53:35

@Ken: Precisely. The issue is when two columns are present, the sort order changes. With two columns, the zero in '10' is sorted before the space in '1', wheras the one in '11' is sorted after the space in '1'.

michael 2010-06-24 14:58:02

Answer 2

+2 A:

The sort can be performed the way you want by restricting the key to the column you're interested in:

sort -k1,1 inputfile

Dennis Williamson 2010-06-24 05:41:30

Yes! This works, thanks. But I don't understand why it doesn't work without this. W/out keys specified, it defaults to the entire line as a key. So in the case of one "column", e.g.: {100, 10, 1} "100" is sorted after "10", and "10" is after "1", meaning "0" is sorted after the space char. But with two (space- or tab- delimited) "columns", e.g. {100 1, 10 1} "100 1" is sorted before "10 1", meaning "0" is sorted before space when each line is treated as a single key. I'll check locale seetings some more. I tried setting LC_ALL=C like the docs suggest, but that didn't change anything.

michael 2010-06-24 16:29:03

`LANG=C` and `LC_ALL=C` both worked for me. `LC_ALL=C sort inputfile` (all on one line). "0" before space is a locale thing.

Dennis Williamson 2010-06-24 18:26:17

ansaurus

tags:

views:

answers:

linux bash 'sort' in dictionary order - problem with zeros

related questions