ansaurus

Question

How can I randomize the lines in a file using a standard tools on Redhat Linux

Answer 1

+2 A:

And a Perl one-liner you get!

perl -MList::Util -e 'print List::Util::shuffle <>'

It uses a module, but the module is part of the Perl code distribution. If that's not good enough, you may consider rolling your own.

I tried using this with the -i flag ("edit-in-place") to have it edit the file. The documentation suggests it should work, but it doesn't. It still displays the shuffled file to stdout, but this time it deletes the original. I suggest you don't use it.

Consider a shell script:

#!/bin/sh

if [[ $# -eq 0 ]]
then
  echo "Usage: $0 [file ...]"
  exit 1
fi

for i in "$@"
do
  perl -MList::Util -e 'print List::Util::shuffle <>' $i > $i.new
  if [[ `wc -c $i` -eq `wc -c $i.new` ]]
  then
    mv $i.new $i
  else
    echo "Error for file $i!"
  fi
done

Untested, but hopefully works.

Chris Lutz 2009-05-20 05:17:02

The perl one-liner worked great! Thanks!

Stuart Woodward 2009-05-20 07:53:50

To backup the original file, you can suffix an extension to the -i flag [http://perldoc.perl.org/perlrun.html]

Steve Schnepp 2009-05-25 08:11:48

Answer 2

+2 A:

cat yourfile.txt | while read f ; do printf "$RANDOM\t%s\n" "$f"; done | sort -n | cut -f2-

Read the file, prepend every line with a random number, sort the file on those random prefixes, cut the prefixes afterwards. One-liner which should work in any semi-modern shell.

ChristopheD 2009-05-20 05:36:18

This works, and is a creative solution, but will delete leading whitespace on lines.

Chris Lutz 2009-05-20 05:39:55

@Chris changing the last cut to |sed 's/^[^\t]*\t//' should fix that

bdonlan 2009-05-20 05:43:17

Kudos to the simplicity of the approach!

Shashikant Kore 2009-05-20 06:22:11

Nice try, but even changing to the sed command deletes whitespace.

Stuart Woodward 2009-05-20 08:00:58

Try this: `cat yourfile.txt | while read f ; do printf "%05d %s\n" "$(( $RANDOM % 100000 ))" "$f"; done | sort -n | cut -c7-`

jwhitlock 2010-10-06 15:19:44

Answer 3

+4 A:

Um, lets not forget

sort --random-sort

Jim T 2009-05-20 11:42:22

All these cool features that I don't have on OS X! Dammit!

Chris Lutz 2009-05-20 16:37:45

Could you tell me which version of sort has this option?

Stuart Woodward 2009-05-21 08:50:55

Well, I'm using gnu-coreutils 7.1 (standard gentoo install), which has sort with this option, not sure when it appeared, or if it's in other implementations.

Jim T 2009-05-21 11:46:48

The feature was committed on 10th December 2005, the release following that was 5.94, so I'm guessing it's been available since that version.

Jim T 2009-05-21 11:58:55

Answer 4

+1 A:

Related to Jim's answer:

My ~/.bashrc contains the following:

unsort ()
{
    LC_ALL=C sort -R "$@"
}

With GNU coreutils's sort, -R = --random-sort, which generates a random hash of each line and sorts by it. The randomized hash wouldn't actually be used in some locales in some older (buggy) versions, causing it to return normal sorted output, which is why I set LC_ALL=C.

Related to Chris's answer:

perl -MList::Util=shuffle -e'print shuffle<>'

is a slightly shorter one-liner. (-Mmodule=a,b,c is shorthand for -e 'use module qw(a b c);'.)

The reason giving it a simple -i doesn't work for shuffling in-place is because Perl expects that the print happens in the same loop the file is being read, and print shuffle <> doesn't output until after all input files have been read and closed.

As a shorter workaround,

perl -MList::Util=shuffle -i -ne'BEGIN{undef$/}print shuffle split/^/m'

will shuffle files in-place. (-n means "wrap the code in a while (<>) {...} loop; BEGIN{undef$/} makes Perl operate on files-at-a-time instead of lines-at-a-time, and split/^/m is needed because $_=<> has been implicitly done with an entire file instead of lines.)

ephemient 2009-05-20 16:20:04

Reiterating that sort -R doesn't exist on OS X, but +1 for some great Perl answers, and a great answer in general.

Chris Lutz 2009-05-20 16:40:40

You could install GNU coreutils on OS X, but (as I've done in the past) you have to be careful not to break the built-in tools... That being said, OP is on Redhat Linux, which definitely has GNU coreutils standard.

ephemient 2009-05-20 16:49:56

ansaurus

tags:

views:

answers:

How can I randomize the lines in a file using a standard tools on Redhat Linux

related questions