ansaurus

Question

Answer 1

+4 A:

I'm not sure I understand why you need anything special. Any traditional PRNG should port more or less directly. A linear congruential should work fine. Do you have some special properties you're trying to establish?

Charlie Martin 2009-05-08 02:18:29

I think he's looking for a library he could call, not to implement it himself. Still a good answer to point him to a solution.

lothar 2009-05-08 02:20:27

Linear congruential is very simple to implement. You can do this with CUDA by having a separate PRNG with its own state in each thread.

Jay Conrod 2009-05-08 02:25:23

Thats what got me a little confused. Each thread would say be seeded from its thread id, but they wouldnt they soon enough start overlapping?

zenna 2009-05-08 02:33:19

Those random algorithms calculate x_n+1 from x_n,an attempt to use them for parallel random number creation will leading to "random" numbers with a very distinct pattern.This is because x_n+1 is a function of x_n.

Danny Varod 2009-06-15 19:56:30

alifeofzen: linerar dependency in the seeds is bad enough, indeed (cf. http://portal.acm.org/citation.cfm?doid=1276927.1276928), maybe you should find some other way of seeding them. Danny: The easiest (for that topic, as random numbers for parallel and distributed systems are very hard to get right) might be a series of lagged Fibonacci generators. I just don't find the paper anymore that outlined this.

Joey 2009-07-08 09:40:47

Answer 2

+1 A:

There's an MDGPU package (GPL) which includes an implementation of the GNU rand48() function for CUDA here.

I found it (quite easily, using Google, which I assume you tried :-) on the NVidia forums here.

paxdiablo 2009-05-08 02:25:38

Yeah I found that too.. but struggled to get it to do what I want to.. I think I'm just having a stupid day.. I'll check it out again, thanks

zenna 2009-05-08 02:29:27

According to the comments in the NVidia forum (including the author's) the implementation doesn't work well.

Danny Varod 2009-06-15 20:04:37

Answer 3

+1 A:

I haven't found a good parallel number generator for CUDA, however I did find a parallel random number generator based on academic research here: http://sprng.cs.fsu.edu/

Danny Varod 2009-06-15 20:06:01

Anyone know of a CUDA version of this algorithm?

Danny Varod 2009-09-20 10:38:49

What do you mean by "good"? Depending on your requirements a simple MD5 hash (see cuDPP) may be enough. For some cases, multiple Mersenne Twisters may be best since they have a really long period and good independence between streams. NAG have l'Ecuyer's MRG32k3a which works really well if you need a single stream across multiple threads/blocks.

Tom 2009-11-22 12:12:05

A good start would be a repetitive pseudo-random number generator with low dependency between the cells - suitable, for creating a set of random number array, filling the contents of each array with multiple threads, but creating the arrays one after the other.

Danny Varod 2009-11-25 23:45:13

Answer 4

A:

Depending on your application you should be wary of using LCGs without considering whether the streams (one stream per thread) will overlap. You could implement a leapfrog with LCG, but then you would need to have a sufficiently long period LCG to ensure that the sequence doesn't repeat.

An example leapfrog could be:

template <typename ValueType>
__device__ void leapfrog(unsigned long &a, unsigned long &c, int leap)
{
    unsigned long an = a;
    for (int i = 1 ; i < leap ; i++)
        an *= a;
    c = c * ((an - 1) / (a - 1));
    a = an;
}

template <typename ValueType>
__device__ ValueType quickrand(unsigned long &seed, const unsigned long a, const unsigned long c)
{
    seed = seed * a;
    return seed;
}

template <typename ValueType>
__global__ void mykernel(
    unsigned long *d_seeds)
{
    // RNG parameters
    unsigned long a = 1664525L;
    unsigned long c = 1013904223L;
    unsigned long ainit = a;
    unsigned long cinit = c;
    unsigned long seed;

    // Generate local seed
    seed = d_seeds[bid];
    leapfrog<ValueType>(ainit, cinit, tid);
    quickrand<ValueType>(seed, ainit, cinit);
    leapfrog<ValueType>(a, c, blockDim.x);

    ...
}

But then the period of that generator is probably insufficient in most cases.

To be honest, I'd look at using a third party library such as NAG. There are some batch generators in the SDK too, but that's probably not what you're looking for in this case.

Tom 2009-11-20 15:33:15

ansaurus

tags:

views:

answers:

Random Number Generator in CUDA

related questions