tags:

views:

111

answers:

4

Hi All,

I have a text file containing ~300k rows. Each row has a varying number of comma-delimited fields, the last of which is guaranteed numerical. I want to sort the file by this last numerical field. I can't do:

sort -t, -n -k 2 file.in > file.out

as the number of fields in each row is not constant. I think sed, awk maybe the answer, but not sure how. E.g:

awk -F, '{print $NF}' file.in

gives me the last column value, but how to use this to sort the file?

A: 

Maybe reverse the fields of each line in the file before sorting? Something like

perl -ne 'chomp; print(join(",",reverse(split(","))),"\n")' |
  sort -t, -n -k1 |
  perl -ne 'chomp; print(join(",",reverse(split(","))),"\n")'

should do it, as long as commas are never quoted in any way. If this is a full-fledged CSV file (in which commas can be quoted with backslash or space) then you need a real CSV parser.

Zack
+1  A: 
vim file.in -c '%sort n /.*,\zs/' -c 'saveas file.out' -c 'q'
Benoit
Why not use `ex` if you're going to go that route? Vim gets that particular functionality from `ex` anyway.
JUST MY correct OPINION
`ex` is just `vim`, with the `-e` option. Does not really matter in this case.
Benoit
`ex` predates `vim` (and `vi`) by quite a long time. `vim` may have an `ex` emulation mode, but this does not make it `ex`.
JUST MY correct OPINION
Yes. But if you have vim on your system, `ex` will be a symlink to `vim` usually. Vim detects at startup what name it has been invoked under.
Benoit
+4  A: 

Use awk to put the numeric key up front. $NF is the last field of the current record. Sort. Use sed to remove the duplicate key.

awk -F, '{ print $NF, $0 }' < yourfile | sort -n -k1 | sed 's/^[0-9][0-9]* //'
larsmans
no need redirection. `awk -F, '{ print $NF, $0 }' yourfile`
ghostdog74
A: 

Perl one-liner:

@lines=<STDIN>;foreach(sort{($a=~/.*,(\d+)/)[0]<=>($b=~/.*,(\d+)/)[0]}@lines){print;}
Benoit