tags:

views:

3396

answers:

4

Hey people

I've struggled with this all day, I am trying to get a random number generator for threads in my CUDA code. I have looked through all forums and yes this topic comes up a fair bit but I've spent hours trying to unravel all sorts of codes to no avail. If anyone knows of a simple method, probably a device kernel that can be called to returns a random float between 0 and 1, or an integer that I can transform I would be most grateful.

Again, I hope to use the random number in the kernel, just like rand() for instance.

Thanks in advance

+4  A: 

I'm not sure I understand why you need anything special. Any traditional PRNG should port more or less directly. A linear congruential should work fine. Do you have some special properties you're trying to establish?

Charlie Martin
I think he's looking for a library he could call, not to implement it himself. Still a good answer to point him to a solution.
lothar
Linear congruential is very simple to implement. You can do this with CUDA by having a separate PRNG with its own state in each thread.
Jay Conrod
Thats what got me a little confused. Each thread would say be seeded from its thread id, but they wouldnt they soon enough start overlapping?
zenna
Those random algorithms calculate x_n+1 from x_n,an attempt to use them for parallel random number creation will leading to "random" numbers with a very distinct pattern.This is because x_n+1 is a function of x_n.
Danny Varod
alifeofzen: linerar dependency in the seeds is bad enough, indeed (cf. http://portal.acm.org/citation.cfm?doid=1276927.1276928), maybe you should find some other way of seeding them. Danny: The easiest (for that topic, as random numbers for parallel and distributed systems are very hard to get right) might be a series of lagged Fibonacci generators. I just don't find the paper anymore that outlined this.
Joey
+1  A: 

There's an MDGPU package (GPL) which includes an implementation of the GNU rand48() function for CUDA here.

I found it (quite easily, using Google, which I assume you tried :-) on the NVidia forums here.

paxdiablo
Yeah I found that too.. but struggled to get it to do what I want to.. I think I'm just having a stupid day.. I'll check it out again, thanks
zenna
According to the comments in the NVidia forum (including the author's) the implementation doesn't work well.
Danny Varod
+1  A: 

I haven't found a good parallel number generator for CUDA, however I did find a parallel random number generator based on academic research here: http://sprng.cs.fsu.edu/

Danny Varod
Anyone know of a CUDA version of this algorithm?
Danny Varod
What do you mean by "good"? Depending on your requirements a simple MD5 hash (see cuDPP) may be enough. For some cases, multiple Mersenne Twisters may be best since they have a really long period and good independence between streams. NAG have l'Ecuyer's MRG32k3a which works really well if you need a single stream across multiple threads/blocks.
Tom
A good start would be a repetitive pseudo-random number generator with low dependency between the cells - suitable, for creating a set of random number array, filling the contents of each array with multiple threads, but creating the arrays one after the other.
Danny Varod
A: 

Depending on your application you should be wary of using LCGs without considering whether the streams (one stream per thread) will overlap. You could implement a leapfrog with LCG, but then you would need to have a sufficiently long period LCG to ensure that the sequence doesn't repeat.

An example leapfrog could be:

template <typename ValueType>
__device__ void leapfrog(unsigned long &a, unsigned long &c, int leap)
{
    unsigned long an = a;
    for (int i = 1 ; i < leap ; i++)
        an *= a;
    c = c * ((an - 1) / (a - 1));
    a = an;
}

template <typename ValueType>
__device__ ValueType quickrand(unsigned long &seed, const unsigned long a, const unsigned long c)
{
    seed = seed * a;
    return seed;
}

template <typename ValueType>
__global__ void mykernel(
    unsigned long *d_seeds)
{
    // RNG parameters
    unsigned long a = 1664525L;
    unsigned long c = 1013904223L;
    unsigned long ainit = a;
    unsigned long cinit = c;
    unsigned long seed;

    // Generate local seed
    seed = d_seeds[bid];
    leapfrog<ValueType>(ainit, cinit, tid);
    quickrand<ValueType>(seed, ainit, cinit);
    leapfrog<ValueType>(a, c, blockDim.x);

    ...
}

But then the period of that generator is probably insufficient in most cases.

To be honest, I'd look at using a third party library such as NAG. There are some batch generators in the SDK too, but that's probably not what you're looking for in this case.

Tom