tags:

views:

61

answers:

2

Dear all,

I wish to shuffle the lines (the rows) of a file at random then print out to different five files.

But I keep having exactly the same order of lines appeared in file1 to file5. The random generation process does not work properly. I would be grateful for any advices.

#!/bin/bash
for i in seq 1 5
do
  cat shuffling.txt | awk 'BEGIN{srand();}  {print rand()"\t"$0}' | sort -k2 -k1 -n | cut -f2-  > file$i.txt
done

Input shuffling.txt

111 1032192
111 2323476
111 1698881
111 2451712
111 2013780
111  888105
112 2331004
112 1886376
112 1189765
112 1877267
112 1772972
112  574631

Cheers,

T

A: 

If you don't provide a seed to srand, it will use the current date and time. That means, if your processes run fast enough, they'll all use the same seed and generate the same sequence.

You can get around this by using a different seed, provided by the shell.

awk -v seed=$RANDOM 'BEGIN{srand(seed);}{print rand()" "$0}' ...

The number provided by $RANDOM changes in each iteration so each run of the awk program gets a different seed.

You can see this in action in the following transcript:

pax> for i in $(seq 1 5) ; do
...> echo | awk 'BEGIN{srand();}{print rand()}'
...> done
0.0435039
0.0435039
0.0435039
0.0435039
0.0435039

pax> for i in $(seq 1 5) ; do
...> echo | awk -v seed=$RANDOM 'BEGIN{srand(seed);}{print rand()}'
...> done
0.283898
0.0895895
0.841535
0.249817
0.398753
paxdiablo
It works. Many thanks your help.
Tony
It works. Many thanks for your help.
Tony
@Tony, feel free at some point to upvote the answers that helped, and accept the best one, whichever that one may be. I would choose this one but I'm hardly unbiased :-). The arrows above the numbers on the left are used for upvoting and there should be a hollow green tick mark close by that you can click on to accept an answer. Cheers.
paxdiablo
A: 
#!/bin/bash
for i in {1..5}
do
    shuf -o "file$i.txt" shuffling.txt
done
Dennis Williamson
This is another way of doing it without setting the seed. Cheers - T
Tony