tags:

views:

56

answers:

2

Hi!

I have the following problem:

I would like to sort a file on more fields. A sample tab separated file is:

a   1   1.0
b   2   0.1
c   3   0.3
a   4   0.001
c   5   0.5
a   6   0.01
b   7   0.01
a   8   0.35
b   9   2.3
c   10  0.1
c   11  1.0
b   12  3.1
a   13  2.1

And i would like to have it sorted alphabetically by field 1 (with -d), and when field1 is the same, sort by field 3 (with -g option).

A didn't succeed in doing this. My tryings were(with a TAB character instead ()):

cat tst | sort -t"<TAB>" -k1 -k3n
cat tst | sort -t"<TAB>" -k1d -k3n
cat tst | sort -t"<TAB>" -k3n -k1d

None of these are working. I'm not sure if sort is even able to do this. I'll write a script for workaround, so i'm just curious whether there is a solution with only using sort.

Thanks!

Attila

+4  A: 

The manual shows some examples.

In accordance to zseder's comment, this works:

sort -t"<TAB>" -k1,1d -k3,3g

Tab should theoretically work also like this sort -t"\t".

If none of the above work to delimit by tab, this is an ugly workaround:

TAB=`echo -e "\t"`
sort -t"$TAB"
inflagranti
This is working! Thank you. (With cmd: `sort -t"<TAB>" -k1,1d -k3,3g`)However, i didn't find what comma means for sort in the linked manual and the normal manual page. I have to google more on this.And yes, i can sort with tabs. I can sed it to any other separator, and i can give TAB to my terminal with "ctrl+v; TAB", so it's not a problem, i just wanted to make clear that that's not what i'm doing wrong.Anyways, Thank you!
zseder
You could also create a tab without using `echo` or ctrl+v: `TAB=$'\t'`. @zseder: The comma is a range operator in this context. The argument `-k1,1d` means "create a key starting at column one and ending at column one in dictionary order".
Dennis Williamson
And how come that -k1d is not equal with -k1,1d? My intention is that these should be the same...
zseder
I understand now. With -k1d i only order sort to use fields starting from 1
zseder
+1  A: 

Here is a Python script that you might use as a starting point:

#!/usr/bin/env python2.6

import sys
import string

def main():
    fname = sys.argv[1]
    data = []
    with open(fname, "rt") as stream:
        for line in stream:
            line = line.strip()
            a, b, c = line.split()
            data.append((a, int(b), float(c)))
    data.sort(key=my_key)
    print data


def my_key(item):
    a, b, c = item
    return c, lexicographical_key(a)


def lexicographical_key(a):
    # poor man's attempt, should use Unicode classification etc.
    return a.translate(None, string.punctuation)


if __name__ == "__main__":
    main()
Philipp