views:

878

answers:

7

Whats the best way to shuffle a certain percentage of elements in a vector.

Say I want 10% or 90% of the vector shuffled. Not necessarily the first 10% but just 10% across the board.

TIA

+1  A: 

one way may using , std::random_shuffle() , control % by controlling input range ....

sat
+1  A: 

Why not perform N swaps of randomly selected positions, where N is determined by the percentage?

So if I have 100 elements, a 10% shuffle will perform 10 swaps. Each swap randomly picks two elements in the array and switches them.

Mikeb
This probably won't give uniform probability to all of the various kinds of shuffles that you want.
Ken Bloom
@kbloom: What do you mean by "all the various kinds of shuffles that you want"? We all know what a shuffle is. What is a 10% shuffle? A shuffle where 10% of the elements have potentially changed position? A shuffle where exactly 10% of the elements have changed position? A full shuffle over some 10% of the array, either predefined or randomly picked? There's lots of things like this: given a random chord in a circle, what's the probability it's longer than a leg of an inscribed equilateral triangle? It's one half, or one third, or one fourth.
David Thornley
+3  A: 

Modify a Fisher-Yates shuffle to do nothing on 10% of the indices in the array.

This is java code that I'm posting (from Wikipedia) and modifying, but I think you can make the translation to C++, because this is more of an algorithms problem than a language problem.

public static void shuffleNinetyPercent(int[] array) 
{
    Random rng = new Random();       // java.util.Random.
    int n = array.length;            // The number of items left to shuffle (loop invariant).
    while (n > 1) 
    {
        n--;                         // n is now the last pertinent index
        if (rng.nextDouble() < 0.1) continue; //<-- ADD THIS LINE
        int k = rng.nextInt(n + 1);  // 0 <= k <= n.
        // Simple swap of variables
        int tmp = array[k];
        array[k] = array[n];
        array[n] = tmp;
    }
}
Ken Bloom
That does not guarantee that 10% of array will be shuffled, because of randomness in `if (rng.nextDouble() < 0.1) continue`
Aleksei Potov
Right. That gets you approximately 90% of the array shuffled. Maybe then it's better to shuffle a list of all array indices, then call continue if the current index is in the first 10% of that shuffled array of indices.
Ken Bloom
I agree it would be better
fa.
A: 

you can use the shuffle bag algorithm to select 10% of your array. Then use the normal shuffle on that selection.

fa.
+1  A: 

How about writing your own random iterator and using random_shuffle, something like this: (Completely untested, just to get an idea)

template<class T>
class myRandomIterator : public std::iterator<std::random_access_iterator_tag, T>
{
public:
    myRandomIterator(std::vector<T>& vec, size_t pos = 0): myVec(vec), myIndex(0), myPos(pos)
    {
     srand(time(NULL));
    }

    bool operator==(const myRandomIterator& rhs) const
    {
     return myPos == rhs.myPos;
    }

    bool operator!=(const myRandomIterator& rhs) const
    {
     return ! (myPos == rhs.myPos);
    }

    bool operator<(const myRandomIterator& rhs) const
    {
     return myPos < rhs.myPos;
    }

    myRandomIterator& operator++() 
    {
     ++myPos;
     return fill();
    }

    myRandomIterator& operator++(int) 
    {
     ++myPos;
     return fill();
    }

    myRandomIterator& operator--() 
    {
     --myPos;
     return fill();
    }

    myRandomIterator& operator--(int)
    {
     --myPos;
     return fill();
    }



    myRandomIterator& operator+(size_t n) 
    {
     ++myPos;
     return fill();
    }

    myRandomIterator& operator-(size_t n) 
    {
     --myPos;
     return fill();
    }


    const T& operator*() const
    {
     return myVec[myIndex];
    }

    T& operator*()
    {
     return myVec[myIndex];
    }



private:
    myRandomIterator& fill()
    {
     myIndex = rand() % myVec.size();
     return *this;
    }

private:
    size_t myIndex;
    std::vector<T>& myVec;
    size_t myPos;

};

int main()
{
    std::vector<int> a;
    for(int i = 0; i < 100; ++i)
    {
     a.push_back(i);
    }

    myRandomIterator<int> begin(a);
    myRandomIterator<int> end(a, a.size() * 0.4);

    std::random_shuffle(begin, end);

    return 0;
}
Naveen
Most elegant solution I think... but I would definitely use some Boost.Iterators there to alleviate the need for all the boiler-plate code.
Matthieu M.
+2  A: 

You could try this:

Assign a random number to each element of the vector. Shuffle the elements whose random number is in the smallest 10% of the random numbers you assigned: You could even imagine replacing that 10% in the vector with placeholders, then sort your 10% according to their random number, and insert them back into the vector where your placeholders are.

Jessica W.
A: 

If you have SGI's std::random_sample extension, you can do this. If not, it's easy to implement random_sample on top of a function which returns uniformly-distributed random integers in a specified range (Knuth, Volume 2, "Algorithm R").

#include <algorithm>
#include <vector>
using std::vector;

void shuffle_fraction(vector<int> &data, double fraction) {
    assert(fraction >= 0.0 && fraction <= 1.0);

    // randomly choose the indices to be shuffled
    vector<int> bag(data.size());
    for(int i = 0; i < bag.size(); ++i) bag[i] = i;

    vector<int> selected(static_cast<int>(data.size() * fraction));
    std::random_sample(bag.begin(), bag.end(), selected.begin(), selected.end());

    // take a copy of the values being shuffled
    vector<int> old_value(selected.size());
    for (int i = 0; i < selected.size(); ++i) {
        old_value[i] = data[selected[i]];
    }

    // choose a new order for the selected indices
    vector<int> shuffled(selected);
    std::random_shuffle(shuffled.begin(), shuffled.end());

    // apply the shuffle to the data: each of the selected indices
    // is replaced by the value for the corresponding shuffled indices
    for (int i = 0; i < selected.size(); ++i) {
        data[selected[i]] = old_value[shuffled[i]];
    }
}

Not the most efficient, since it uses three "small" vectors, but avoids having to adapt the Fisher-Yates algorithm to operate on a subset of the vector. In practice you'd probably want this to be a function template operating on a pair of random-access iterators rather than a vector. I haven't done that because I think it would obfuscate the code a little, and you didn't ask for it. I'd also take a size instead of a proportion, leaving it up to the caller to decide how to round fractions.

Steve Jessop