views:

107

answers:

7

What is the simplest way to sort a list of lines, sorting on the last field of each line? Each line may have a variable number of fields.

Something like

sort -k -1

is what I want, but sort(1) does not take negative numbers to select fields from the end instead of the start.

I'd also like to be able to choose the field delimiter too.

Edit: To add some specificity to the question: The list I want to sort is a list of pathnames. The pathnames may be of arbitrary depth hence the variable number of fields. I want to sort on the filename component.

This additional information may change how one manipulates the line to extract the last field (basename(1) may be used), but does not change sorting requirements.

e.g.

/a/b/c/10-foo
/a/b/c/20-bar
/a/b/c/50-baz
/a/d/30-bob
/a/e/f/g/h/01-do-this-first
/a/e/f/g/h/99-local

I want this list sorted on the filenames, which all start with numbers indicating the order the files should be read.

I've added my answer below which is how I am currently doing it. I had hoped there was a simpler way - maybe a different sort utility - perhaps without needing to manipulate the data.

A: 

sort allows you to specify the delimiter with the -t option, if I remember it well. To compute the last field, you can do something like counting the number of delimiters in a line and sum one. For instance something like this (assuming the ":" delimiter):

d=`head -1 FILE | tr -cd :  | wc -c`
d=`expr $d + 1`

($d now contains the last field index).

Diego Sevilla
But the original question specified a potentially-variable number of fields on each line.
sarnold
Ahh, I see. I would then suggest to make the input file more uniform by making each line have the same number of fields :) (It does not come to my mind any example of such file that could not be generated so that every line has the same number of fields... Things are much better that moment on... :) )
Diego Sevilla
+1  A: 

I think the only solution would be to use awk:

  1. Put the last field to the front using awk.
  2. Sort lines.
  3. Put the first field to the end again.
Thevs
A: 

A one-liner in perl for reversing the order of the fields in a line:

perl -lne 'print join " ", reverse split / /'

You could use it once, pipe the output to sort, then pipe it back and you'd achieve what you want. You can change / / to / +/ so it squeezes spaces. And you're of course free to use whatever regular expression you want to split the lines.

integer
Why invoke Perl twice? It can sort, so why not use Perl for the sorting, too?
Gabe
Because sorting in Perl would mean pulling the entire stream into memory, something which sort does not require. It won't crash in a fire if you try sorting gigabytes of data -- it'll use buckets and temp files that it merges to provide the final output stream. Fully reduplicating sort's functionality in Perl would take a lot of code for little or no gain.
integer
Hmm, I tested it on a large data set now. I tried sorting a 250MB file of strings (about 13M lines) using `sort` and `perl -e 'print sort <>'` -- in the former case it took a while, but sort never used more than 120MB resident memory, in the second case Perl greedily shot up to 1.4+GB resident (I have no idea why it would be so much though? Is there that much redundancy in storing lists of strings?), CPU usage went to nil since all it did was swap, and my computer became unusable, had to break it...
integer
+1  A: 

something like this

awk '{print $NF"|"$0}' file | sort -t"|" -k1 | awk -F"|" '{print $NF }'
ghostdog74
A: 
#!/usr/bin/ruby

f = ARGF.read
lines = f.lines

broken = lines.map {|l| l.split(/:/) }

sorted = broken.sort {|a, b|
    a[-1] <=> b[-1]
}

fixed = sorted.map {|s| s.join(":") }

puts fixed

If all the answers involve perl or awk, might as well solve the whole thing in the scripting language. (Incidentally, I tried in Perl first and quickly remembered that I dislike Perl's lists-of-lists. I'd love to see a Perl guru's version.)

sarnold
Noting that the `/:/` is because I did my testing on /etc/passwd .. feel free to change the delimiter, or better yet, parameterize it.
sarnold
I'm not a Perl guru, but see my solution: http://stackoverflow.com/questions/3222810/sorting-on-the-last-field-of-a-line/3225496#3225496
Gabe
A: 

Replace the last delimiter on the line with another delimiter that does not otherwise appear in the list, sort on the second field using that other delimiter as the sort(1) delimiter, and then revert the delimiter change.

delim=/
new_delim=" "
cat $list \
| sed "s|\(.*\)$delim|\1$new_delim|" \
| sort -t"$new_delim" -k 2,2 \
| sed "s|$new_delim|$delim|"

The problem is knowing what delimiter to use that does not appear in the list. You can make multiple passes over the list and then grep for a succession of potential delimiters, but it's all rather nasty - particularly when the concept of "sort on the last field of a line" is so simply expressed, yet the solution is not.

Edit: One safe delimiter to use for $new_delim is NUL since that cannot appear in filenames, but I don't know how to put a NUL character into a bourne/POSIX shell script (not bash) and whether sort and sed will properly handle it.

camh
+2  A: 

Here's a Perl command line:

perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} <>"

Just pipe the list into it or, if the list is in a file, put the filename at the end of the command line.

Note that this script does not actually change the data, so you don't have to be careful about what delimeter you use.

Here's sample output:

>perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} " files.txt
/a/e/f/g/h/01-do-this-first
/a/b/c/10-foo
/a/b/c/20-bar
/a/d/30-bob
/a/b/c/50-baz
/a/e/f/g/h/99-local
Gabe
Not really a fan of perl, but +1 for an answer that does not munge the data. I'd give another +1 for the functional style too if I could.
camh
I take that back - it doesn't work for me (perl v5.10.1). I cut'n'pasted your line and my data in the question, so no typos.
camh
I'm using Perl v5.8.8, but I doubt it makes much difference. My program works on your sample, so maybe you should post your actual input and I can see what's wrong.
Gabe
My input is what is in the question as the sample input. I wonder why my perl behaves differently?
camh
Doh. On the command line, the $a and $b need the $ to be escaped. With that done, it sorts correctly - thanks.
camh
Oh, of course! I forgot that quoting and escaping depends on your shell!
Gabe
Wonderful! Thanks Gabe.
sarnold
I see that with the updated question it's probably not necessary to think about memory, though in the general case I'd strongly advise against pulling a whole stream onto the heap when it's strictly not necessary. (Ref. comments on my own answer attempt.)
integer