views:

51

answers:

1

I am trying to learn how to use rand_r, and after reading this question I am still a little confused, can someone please take a look and point out what I'm missing? To my understanding, rand_r takes a pointer to some value (or a piece of memory with some initial value) and use it to generate new numbers every time it is called. Each thread that calls rand_r should supply it with a unique pointer (or piece of memory) to get "actual random" numbers between different threads. That's why this:

int globalSeed;

//thread 1
rand_r(&globalSeed);

//thread 2
rand_r(&globalSeed);

is the wrong way of using it. If I have

int seed1,seed2;

//thread 1
rand_r(&seed1);

//thread 2
rand_r(&seed2);

this would be the right way to generate "true random" numbers between threads?


EDIT: additional questions after reading answers to the above part:

  1. if in thread 1 I need a random number between 1 to n, should I do (rand_r(&seed1) % (n-1)) + 1 ? Or there is other common way of doing this?
  2. Is it right or normal if the memory for the seed is dynamically allocated?
+2  A: 

That's correct. What you're doing in the first case is bypassing the thread-safety nature of rand_r. With many non-thread-safe functions, persistent state is stored between calls to that function (such as the random seed here).

With the thread-safe variant, you actually provide a thread-specific piece of data (seed1 and seed2) to ensure the state is not shared between threads.

Keep in mind that this doesn't make the numbers truly random, it just makes the sequences independent of each other. If you start them with the same seed, you'll probably get the same sequence in both threads.

By way of example, let's say you get a random sequence 2, 3, 5, 7, 11, 13, 17 given an initial seed of 0. With a shared seed, alternating calls to rand_r from two different threads would cause this:

thread 1                thread 2
           <---  2
                 3 --->
           <---  5
                 7 --->
           <--- 11
                13 --->
           <--- 17

and that's the best case - you may actually find that the shared state gets corrupted since the updates on it may not be atomic.

With non-shared state (with a and b representing the two different sources of the random numbers):

thread 1                thread 2
           <---  2a
                 2b --->
           <---  3a
                 3b --->
           <---  5a
                 5b --->
                 ::

Some thread-safe calls require you to provide the thread-specific state like this, others can create thread-specific data under the covers (using a thread ID or similar information) so that you never need to worry about it, and you can use exactly the same source code in threaded and non-threaded environments. I prefer the latter myself, simply because it makes my life easier.


Additional stuff for edited question:

> If in thread 1, I need a random number between 1 to n, should I do '(rand_r(&seed1) % (n-1)) + 1', or there is other common way of doing this?

Assuming you want a value between 1 and n inclusive, use (rand_r(&seed1) % n) + 1. The first bit gives you a value from 0 to n-1 inclusive, then you add 1 to get the desired range.

> Is it right or normal if the memory for the seed is dynamically allocated?

The seed has to be persistent as long as you're using it. You could dynamically allocate it in the thread but you could also declare it in the thread's top-level function. In both those cases, you'll need to communicate the address down to the lower levels somehow (unless your thread is just that one function which is unlikely).

You could either pass it down through the function calls or set up a global array somehow where the lower levels can discover the correct seed address.

Alternatively, since you need a global array anyway, you can have a global array of seeds rather than seed addresses, which the lower levels could use to discover their seed.

You would probably (in both cases of using the global array) have a keyed structure containing the thread ID as a key and the seed to use. You would then have to write tour own rand() routine which located the correct seed and called rand_r() with that.

This is why I prefer library routines which do this under the covers with thread-specific data.

paxdiablo
thank you!! so the best way is always using different seeds with different initial values for different threads, right?
derrdji
I would say usually, yes. But there are some situations where you might _want_ shared state (first diagram in my answer), in which case you could avoid the corruption possibility by calling `rand_r` with the _same_ state but protected by a mutex. The one thing you don't want to do is to initialise both sequences with the same seed since the sequences would then be identical.
paxdiablo
yes, thank you!! I was just about to ask if I could share one and guard the rand_r call with a mutex lock.
derrdji