views:

117

answers:

2

Hi all,

As noticed in this question: http://stackoverflow.com/questions/273313/randomize-a-listt-in-c you can implement a shuffle method on a list; like one of the answers mentions:

using System.Security.Cryptography;
...
public static void Shuffle<T>(this IList<T> list)
{
    RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();
    int n = list.Count;
    while (n > 1)
    {
        byte[] box = new byte[1];
        do provider.GetBytes(box);
        while (!(box[0] < n * (Byte.MaxValue / n)));
        int k = (box[0] % n);
        n--;
        T value = list[k];
        list[k] = list[n];
        list[n] = value;
    }
}

Does anyone know if it's possible to "Parallel-ize" this using some of the new features in C# 4?

Just a curiosity.

Thanks all,

-R.

+1  A: 

Well, you could fairly easily parallelize the code which is generating the random numbers - which could be the bottleneck when using a cryptographically secure random number generator. You could then use that sequence of random numbers (which would need to have its order preserved) within a single thread performing the swaps.

One problem which has just occurred to me though, is that RNGCryptoServiceProvider isn't thread-safe (and neither is System.Random). You'd need as many random number generators as threads to make this work. Basically it becomes a bit ugly :(

Jon Skeet
Thanks for the warning. The question is indeed in regards to taking one list and splitting the work of shuffling it over many processors. My question is just more or less of a curiosity; since I'm trying to get some background on what C# 4's new parallel stuff provides; and seeing what could be done with it.
Randster
A: 

Your question is ambiguous, but you can use the Parallel Framework to help doing some operations in parallel, but it depends on whether you want to get the bytes, then shuffle them, so the getting of the bytes is in parallel, or, if you want to shuffle multiple lists at one time.

If it is the former, I would suggest that you first break your code into smaller functions, so you can do some analysis to see where the bottlenecks are, as, if the getting of the bytes is the bottleneck, then doing it in parallel may make a difference. By having tests in place you can test new functions and decide if it is worth the added complexity.

static RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();

private static byte[] GetBytes() {
        byte[] box = new byte[1];
        do provider.GetBytes(box);
        while (!(box[0] < n * (Byte.MaxValue / n)));
        return box;
}
private static IList<T> InnerLoop(int n, IList<T> list) {
        var box = GetBytes(n);
        int k = (box[0] % n);
        T value = list[k];
        list[k] = list[n];
        list[n] = value;
        return list;
}

public static void Shuffle<T>(this IList<T> list)
{
    int n = list.Count;
    while (n > 1)
    {
        list = InnerLoop(n, list);
        n--;
    }
}

This is a rough idea as to how to split your function, so you can replace the GetBytes function, but you may need to make some other changes to test it.

Getting some numbers is crucial to make certain that you are getting enough of a benefit to warrant adding complexity though.

You may want to move the lines in InnerLoop that deals with list into a separate function, so you can see if that is slow and perhaps swap out some algorithms to improve it, but, you need to have an idea how fast you need the entire shuffle operation to go.

But if you want to just do multiple lists then it will be easy, you may want to perhaps look at PLINQ for that.

UPDATE

The code above is meant to just show an example of how it can be broken into smaller functions, not to be a working solution. If it is necessary to move the Provider class that I put into a static variable into a function and then pass it as a parameter then that may need to be done. I didn't test the code, but my suggestion is based on getting profiling then look at how to improve it's performance, especially since I am not certain which way it was meant to be done in parallel. It may be necessary to just build up an array in order, in parallel, then shuffle them, but first see what the time needed for each operation is, then see if doing it in parallel will be warranted.

There may be a need to use concurrent data structures also, if using multiple threads, in order to not pay a penalty by having to synchronize yourself, but, again, that may not be needed, depending on where the bottleneck is.

UPDATE:

Based on the answer to my comment, you may want to look at the various functions in the parallel library, but this page may help. http://msdn.microsoft.com/en-us/library/dd537609.aspx.

You can create a Func version of your function, and pass that in as a parameter. There are multiple ways to work use this library to make this function in parallel, as you already don't have any global variables, just lose the static operator.

You will want to get numbers as you add more threads though, to see where you start to see a decrease in performance, as you won't see a 1:1 improvement, so, if you add 2 threads it won't go twice as fast, so just test and see where it becomes a problem having more threads. Since your function is CPU bound you may want to have only one thread per core, as a rough starting point.

James Black
The static provider won't work, and with that your GetBytes can't be threaded. The Innerloop accesses the range 0..n-1 so that too can't be run in parallel.
Henk Holterman
@Henk Holterman - OK, I updated my answer based on your comment. There will need to be modifications, but I am not trying to give a final solution, as there are too many unknowns, but to help provide some direction on how to find a better answer. Before doing things in parallel, make certain you know that you need that complexity, and in this case, getting timing information is useful, to see where the bottlenecks are, as well as to then know how you can improve on them, and see if the new code is real improvement.
James Black
@@Henk Holterman - I missed a reference to n, so I fixed it. Thank you.
James Black
@James I understand you're trying to help by splitting it up but I (still) think you're not addressing the real problems relating to threading.
Henk Holterman