ansaurus

Question

How to sort by line length, then reverse alphabetically

Answer 1

A:

This will sort a file by line length, longest lines first:

cat file.txt | (while read LINE; do echo -e "${#LINE}\t$LINE"; done) | sort -rn | cut -f 2-

This will replace term with _term_ but won't turn _term_ into __term__:

sed -r 's/(^|[^_])term([^_]|$)/\1_term_\2/g'
sed -r -e 's/(^|[^_])term/\1_term_/g' -e 's/term([^_]|$)/_term_\1/g'

The first will work pretty well except it will miss out on _term and term_, mistakenly leaving those alone. Use the second if that's important. Here's my silly test case:

# echo here is _term_ and then a term you terminator haha _terminator and then _term_inator term_inator | sed -re 's/(^|[^_])term([^_]|$)/\1_term_\2/g'
here is _term_ and then a _term_ you _term_inator haha _terminator and then _term_inator term_inator
# echo here is _term_ and then a term you terminator haha _terminator and then _term_inator term_inator | sed -r -e 's/(^|[^_])term/\1_term_/g' -e 's/term([^_]|$)/_term_\1/g'
here is _term_ and then a _term_ you _term_inator haha __term_inator and then _term_inator _term__inator

John Kugelman 2009-11-03 22:02:21

perfect! I'll give it a go!

Dycey 2009-11-03 22:06:07

Answer 2

+1 A:

Just pipe your stream through this kind of script :

#!/usr/bin/python
import sys

all={}
for line in sys.stdin:
    line=line.rstrip()
    if len(line) in all:
        all[len(line)].append(line)
    else:
        all[len(line)]=[line]

for l in reversed(sorted(all)):
    print "\n".join(reversed(sorted(all[l])))

And for the bonus mark question : again, do it in python (unless there really is a reason not to, but I'd be pretty curious to know it)

Gyom 2009-11-03 22:08:15

Is that the shortest, or clearest way to do that sort, in Python?

Brad Gilbert 2009-11-03 22:27:59

maybe not ; this was my first thought.

Gyom 2009-11-03 22:30:04

Personally, this is a quick-and-dirty enough that I'd rather use a Perl one-liner than write an entire Python script. Though if you insist on Python, it might be cleaner (if less efficient) to just slurp the file, then sort it, then spit it back out.

Chris Lutz 2009-11-03 22:44:06

Answer 3

+2 A:

You could compact it all into one regexp:

$ sed -e 's/\(aaba\|aa\|abba\)/_\1_/g'
testing words aa, aaba, abba.
testing words _aa_, _aaba_, _abba_.

If I understand your question correctly, this will solve all your problems: No "double replacement" and always matching the longest word.

Johannes Hoff 2009-11-03 22:08:55

Shouldn't you still sort the items by length? Or will there be some kind of greedy match going on that will always match the longest possible string?

mobrule 2009-11-03 22:40:59

... plus, that's a hell of a long line for 600 items ;-) but maybe I can split it into more lines...

Dycey 2009-11-03 23:06:37

No need for that: A regular expression will always find the longest match.

Johannes Hoff 2009-11-03 23:08:13

@JH Good to know. Thanks.

mobrule 2009-11-04 02:05:19

@Dycey: Yeah, that would be quite long. You could put the script in a file in that case and do `sed -f regexpfile`.

Johannes Hoff 2009-11-04 08:22:48

Answer 4

A:

This does the sort by length first, then reverse alpha bit

for mask in `tr -c "\n" "." < $FILE | sort -ur`
do
    grep "^$mask$" $FILE | sort -r
done

The tr usage replaces each character in $FILE with a period - which matches any single character in grep.

martin clayton 2009-11-03 22:11:15

Answer 5

+3 A:

You can do this in a one-line Perl script:

perl -e 'print sort { length $b<=>length $a || $b cmp $a } <>' input

mobrule 2009-11-03 22:15:29

Should probably change `$a cmp $b` to be `$b cmp $a`, since he wanted it in reverse order.

Brad Gilbert 2009-11-03 22:28:41

Thanks Brad, fixed.

mobrule 2009-11-03 22:39:06

+1 Any task you might be using lots of shell scripting for can be done easier, shorter, and potentially clearer in Perl.

Chris Lutz 2009-11-03 22:40:50

shorter doesn't mean clearer.

ghostdog74 2009-11-04 00:03:29

I find this clearer than the Python solution http://stackoverflow.com/questions/1670397/_/1670454#1670454

Brad Gilbert 2009-11-04 21:36:21

I would probably write it: `perl -E'say for sort { length $b<=>length $a } grep chomp, <>' input`

Brad Gilbert 2009-11-04 21:42:46

Answer 6

+1 A:

$ awk '{print length($1),$1}' file |sort -rn
4 abba
4 aaba
3 bab
3 aba
2 ab
2 aa

i leave you to try getting rid of the first column yourself

ghostdog74 2009-11-04 00:12:38

ansaurus

tags:

views:

answers:

How to sort by line length, then reverse alphabetically

related questions