How can I randomize the lines in a file using a standard tools on Redhat Linux?
I don't have the "shuf" command, so I am looking for something like a perl or awk one liner that accomplishes the same task.
How can I randomize the lines in a file using a standard tools on Redhat Linux?
I don't have the "shuf" command, so I am looking for something like a perl or awk one liner that accomplishes the same task.
And a Perl one-liner you get!
perl -MList::Util -e 'print List::Util::shuffle <>'
It uses a module, but the module is part of the Perl code distribution. If that's not good enough, you may consider rolling your own.
I tried using this with the -i
flag ("edit-in-place") to have it edit the file. The documentation suggests it should work, but it doesn't. It still displays the shuffled file to stdout, but this time it deletes the original. I suggest you don't use it.
Consider a shell script:
#!/bin/sh
if [[ $# -eq 0 ]]
then
echo "Usage: $0 [file ...]"
exit 1
fi
for i in "$@"
do
perl -MList::Util -e 'print List::Util::shuffle <>' $i > $i.new
if [[ `wc -c $i` -eq `wc -c $i.new` ]]
then
mv $i.new $i
else
echo "Error for file $i!"
fi
done
Untested, but hopefully works.
cat yourfile.txt | while read f ; do printf "$RANDOM\t%s\n" "$f"; done | sort -n | cut -f2-
Read the file, prepend every line with a random number, sort the file on those random prefixes, cut the prefixes afterwards. One-liner which should work in any semi-modern shell.
Related to Jim's answer:
My ~/.bashrc
contains the following:
unsort ()
{
LC_ALL=C sort -R "$@"
}
With GNU coreutils's sort, -R
= --random-sort
, which generates a random hash of each line and sorts by it. The randomized hash wouldn't actually be used in some locales in some older (buggy) versions, causing it to return normal sorted output, which is why I set LC_ALL=C
.
Related to Chris's answer:
perl -MList::Util=shuffle -e'print shuffle<>'
is a slightly shorter one-liner. (-Mmodule=a,b,c
is shorthand for -e 'use module qw(a b c);'
.)
The reason giving it a simple -i
doesn't work for shuffling in-place is because Perl expects that the print
happens in the same loop the file is being read, and print shuffle <>
doesn't output until after all input files have been read and closed.
As a shorter workaround,
perl -MList::Util=shuffle -i -ne'BEGIN{undef$/}print shuffle split/^/m'
will shuffle files in-place. (-n
means "wrap the code in a while (<>) {...}
loop; BEGIN{undef$/}
makes Perl operate on files-at-a-time instead of lines-at-a-time, and split/^/m
is needed because $_=<>
has been implicitly done with an entire file instead of lines.)