views:

2106

answers:

4

I have a data with the following format:

foo<tab>1.00<space>1.33<space>2.00<tab>3

Now I tried to sort the file based on the last field decreasingly. I tried the following commands but it wasn't sorted as we expected.

$ sort -k3nr file.txt  # apparently this sort by space as delimiter

$ sort -t"\t" -k3nr file.txt
  sort: multi-character tab `\\t'

$ sort -t "`/bin/echo '\t'`" -k3,3nr file.txt
  sort: multi-character tab `\\t'

What's the right way to do it?

Here is the sample data.

A: 

pipe it through something like awk '{ print print $1"\t"$2"\t"$3"\t"$4"\t"$5 }'. This will change the spaces to tabs.

Michiel Buddingh'
@MB: I need to keep the space intact.
neversaint
There's undoubtably a cleaner way to do it, but nothing prevents you from piping it through awk, change the spaces to tabs, sorting the data, and then piping it through awk again, changing the tabs back into spaces.
Michiel Buddingh'
This won't work if there is a mixture of tabs and spaces that you want to preserve.
James Thompson
+2  A: 

By default the field delimiter is non-blank to blank transition so tab should work just fine.

However, the columns are indexed base 1 and base 0 so you probably want

sort -k4nr file.txt

to sort file.txt by column 4 numerically in reverse order. (Though the data in the question has even 5 fields so the last field would be index 5.)

laalto
This will only work if the number of space characters between the tab-separated fields is the same for all lines of input.
Lars Haugseth
A: 

In general keeping data like this is not a great thing to do if you can avoid it, because people are always confusing tabs and spaces.

Solving your problem is very straightforward in a scripting language like Perl, Python or Ruby. Here's some example code:

#!/usr/bin/perl -w

use strict;

my $sort_field = 2;
my $split_regex = qr{\s+};

my @data;
push @data, "7 8\t 9";
push @data, "4 5\t 6";
push @data, "1 2\t 3";

my @sorted_data = 
    map  { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map  { [ ( split $split_regex, $_ )[$sort_field], $_ ] }
    @data;

print "unsorted\n";
print join "\n", @data, "\n";
print "sorted by $sort_field, lines split by $split_regex\n";
print join "\n", @sorted_data, "\n";
James Thompson
+4  A: 

This will do the trick:

$ sort -t$'\t' -k3 -nr file.txt

Notice the dollar sign in front of the single-quoted string. You can read about it in the bash man page, just search for the section starting with "Words of the form".

Lars Haugseth