views:

492

answers:

3

Hi,

I have two linux machines, on which unix sort seems to behave differently. I believe I've narrowed it down to the treatment of the underscore character.

If I run sort tmp, where tmp contains the following two lines:

aa_d_hh
aa_dh_ey

one machine outputs

aa_d_hh
aa_dh_ey

(i.e. '_' precedes 'h') while the other outputs

aa_dh_ey
aa_d_hh

(i.e. 'h' precedes '_'). I need these machines to behave together (as I use sort -m later, to merge very large files).

Is there any way I can force sort to behave in one way or the other?

Thanks.

+2  A: 

sort order depends on the current value of the environment variable LC_COLLATE. Check your local documentation for 'locale', 'setlocale', etc. Set LC_COLLATE to 'POSIX' on both machines, and the results should match.

William Pursell
there is no such environment variable on my machine, yet sort works fine.
Neeraj
A: 

The difference is due to your locale. Use the locale command to check the current settings.

There are a number of different locale categories, such as LC_COLLATE, LC_TIME, and LC_MESSAGES. You can change them all by setting the environment variable LC_ALL or LANG, or only the collation (sort) order by setting the environment variable LC_COLLATE. The locale C or POSIX is a basic locale defined by the standard; others include en_US (US English), fr_FR (French), etc.

mark4o
A: 

You can set LC_COLLATE to traditional sort order just for your command:

env LC_COLLATE=C sort tmp

This won't change the current environment just the one in which the sort command executes. You should have the same behaviour with this.

Mehmet Ergut