views:

235

answers:

9

An application I'm working on requires a matrix of random numbers. The matrix can grow in any direction at any time, and isn't always full. (I'll probably end up re-implementing it with a quad tree or something else, rather than a matrix with a lot of null objects.)

I need a way to generate the same matrix, given the same seed, no matter in which order I calculate the matrix.

LazyRandomMatrix rndMtx1 = new LazyRandomMatrix(1234) // Seed new object
float X = rndMtx1[0,0] // Lazily generate random numbers on demand
float Y = rndMtx1[3,16]
float Z = rndMtx1[23,-5]

Debug.Assert(X == rndMtx1[0,0])
Debug.Assert(Y == rndMtx1[3,16])
Debug.Assert(Z == rndMtx1[23,-5])

LazyRandomMatrix rndMtx2 = new LazyRandomMatrix(1234) // Seed second object
Debug.Assert(Y == rndMtx2[3,16])  // Lazily generate the same random numbers
Debug.Assert(Z == rndMtx2[23,-5]) // on demand in a different order
Debug.Assert(X == rndMtx2[0,0])

Yes, if I knew the dimensions of the array, the best way would be to generate the entire array, and just return values, but they need to be generated independently and on demand.

My first idea was to initialize a new random number generator for each call to a new coordinate, seeding it with some hash of the overall matrix's seed and the coordinates used in calling, but this seems like a terrible hack, as it would require creating a ton of new Random objects.

A: 

Why don't you have one random number generator object per seed and cache the matrix values based on the co-ordinates within that object?

So given a seed

  • Check if you alread created a random number generator for that seed

  • If not, create one.

  • Given that random number generator object, check if you created a random number for those co-ordinates.

  • If not, generate one and cache it.

  • Else return the cached value.

The code will be something like:

   class RandomMatrixGenerator
    {
        private Random _random;


    public RandomMatrixGenerator(int seed)
    {
        _random = new Random(seed);
    }

    public int this[int r, int c]
    {
        get
        {
            // If in cache based on r and c return cached value;
            // else
            int value = _random.Next();

            // cache value based on r,c.
            return value;
        }
    }
}
class LazyMatrixGenerator
{
    static Dictionary<int, RandomMatrixGenerator> s_RandomMatrixGen;
    private RandomMatrixGenerator _randomGenerator;

    static LazyMatrixGenerator()
    {
        s_RandomMatrixGen = new Dictionary<int, RandomMatrixGenerator>();
    }

    LazyMatrixGenerator(int seed)
    {
        if (!s_RandomMatrixGen.ContainsKey(seed))
        {
             _randomGenerator = new RandomMatrixGenerator(seed);
             s_RandomMatrixGen[seed] = _randomGenerator;
        }
        else
        {
            _randomGenerator = s_RandomMatrixGen[seed];
        }
    }

    public int this[int r, int c]
    {
        get
        {
            return _randomGenerator[r, c];
        }
    }
}
Moron
Because this will not work if I lazily request values at coordinates in a different order. It will fail the last three `Asserts` in my sample code. `rndMtx2[3,16]` would be the same as `rndMtx1[0,0]` because they were the first requested (equal to the first number generated by the random number generator.) Likewise, `rndMtx2[23,-5] == rndMtx1[3,16]` and `rndMtx2[0,0] == rndMtx1[23,-5]`
Daniel Rasmussen
I don't know .Net, but I guess you can re-use the same object several times, by reseting it to appropriate seed when value is requested. As for me - I don't like idea of using random generator for producing one value from specific seed. `(new Random(seed)).nextValue()` - isn't that weird?..
ony
No it won't. Given a seed, you will lookup a static table for a random number generator for that seed. If it is already there, you use that and that object is the one which does the caching based on co-ordinates.
Moron
@Moron - who are you refuting? If you're saying that I should just hold a static table of values within the `LazyRandomMatrix` class, and populate it as I go, using it for lookup, then this still requires that the numbers be accessed in the same order if I restart the application, unless I store them all to a file, which seems unnecessary.
Daniel Rasmussen
@Daniel: So you need to persist it across app shutdown/restart? Then you might need to write to a file, correct. And Maybe you should mention that in the question!
Moron
@Moron - the implication was that it works like `Random(seed)`, except where you can specify the number you want, given an index. `Random(seed)` will give you the same numbers every time, without writing them to a file. Sorry if this wasn't clear.
Daniel Rasmussen
@Daniel: So basically you want a pseudo random number generator, which is predictable, given three parameters (seed, row, column)? Well I suppose that can be done.
Moron
@Daniel, I guess @Moron refer to your idea with "initialize a new random number generator for each call" and suggests caching of random numbe generators creation for each cell (I don't think that you need to create it, that is bad if .Net missing reseting generator to some specific seed with object method).
ony
@Ony - I don't need to cache the random number generator for each cell if I can cache the value it returns, which I can - I just need to be able to recreate it again with nothing but the random seed. @Moron - "a pseudo random number generator, which is predictable, given three parameters (seed, row, column)" is exactly what I need.
Daniel Rasmussen
+2  A: 

Standart random generator usually is generator of sequence, where each next element is build from previous. So to generate rndMtx1[3,16] you have to generate all previous elements in a sequence.
Actually you need something different from random generator, because you need only one value, but not the sequence. You have to build your own "generator" which uses seed and indexes as input for formula to produce single random value. You can invent many ways to do so. One of the simplest way is to take random value asm hash(seed + index) (I guess idea of hashes used in passwords and signing is to produce some stable "random" value out of input data).

P.S. You can improve your approach with independent generators (Random(seed + index)) by making lazy blocks of matrix.

ony
How random is a `hash` of similar integers going to be? I was considering lazy blocks to improve the `Random(seed + index)` method, but I still think there has to be a better way.
Daniel Rasmussen
@Daniel, good hash functions should try to produce [uniform-distributed](http://en.wikipedia.org/wiki/Hash_function#Uniformity) hashes. So using them for non-random data should produce uniform distributed value. After that you can transform it to some other distributions.
ony
+1  A: 

A pseudo-random number generator is essentially a function that deterministically calculates a successor for a given value.

You could invent a simple algorithm that calculates a value from its neighbours. If a neighbour doesn't have a value yet, calculate its value from its respective neighbours first.

Something like this:

  • value(0,0) = seed
  • value(x+1,0) = successor(value(x,0))
  • value(x,y+1) = successor(value(x,y))

Example with successor(n) = n+1 to calculate value(2,4):

 \ x  0      1      2
y  +-------------------
 0 | 627    628    629
 1 |               630
 2 |               631
 3 |               632
 4 |               633

This example algorithm is obviously not very good, but you get the idea.

dtb
I was just going to suggest using a quasi-monte carlo, but changed my mind. But seeing your post, I think quasi-random is the way to go.
code4life
A good suggestion, but as I said, they should be generated "independently and on demand." So if I need to get value(100,100), I don't want to calculate 200 successors before I get to the one I want.
Daniel Rasmussen
@Daniel Rasmussen: If you deterministically want to generate the same matrix for the same seed value, and don't want the values to be purely based on the indices, I believe there is no other way than to calculate all intermediary values for a given index pair.
dtb
Yeah, that's great idea for moving character over random world (i.e. when you move one cell at a time). But you still have to generate sequence for random-accessed value.
ony
I'm assuming the draw here is that it will provide a much more equal distribution than using nothing but the index? In all fairness, this would probably work just fine, as I'm doing exactly what ony said - a moving character over random world, and most likely won't have to jump so far as to be a problem.
Daniel Rasmussen
+2  A: 

I think your first idea of instantiating a new Random object seeded by some deterministic hash of (x-coordinate, y-coordinate, LazyRandomMatrix seed) is probably reasonable for most situations. In general, creating lots of small objects on the managed heap is something the CLR is very good at handling efficiently. And I don't think Random.ctor() is terribly expensive. You can easily measure the performance if it's a concern.

A very similar solution which may be easier than creating a good deterministic hash is to use two Random objects each time. Something like:

public int this[int x, int y]
{
    get
    {
        Random r1 = new Random(_seed * x);
        Random r2 = new Random(y);
        return (r1.Next() ^ r2.Next());
    }
}
C. Dragon 76
Or even, new Random(_seed * x ^ y).
DonaldRay
Would this produce a better distribution than simply hashing `seed ^ x ^ y`? (`seed * x ^ y` seems like it could overflow if my coordinates were high enough...)
Daniel Rasmussen
@Daniel: As long as you're not using checked math (see checked keyword or the C# compiler's checked keyword), overflows should be okay since the result will just be the low-order bits that fit. The overflowing bits will be ignored.
C. Dragon 76
@DonaldRay: I used the 2 Random objects simply because it seemed easier to avoid uneven distributions or weird patterns. For example, (_seed * x ^ y) doesn't vary with _seed when x == 0. That is lazy[0, 1] would produce the same result regarless of _seed.
C. Dragon 76
Most RNGs produce similar outputs for similar seeds, so this is a bad idea.
Nick Johnson
A: 

As far as I see, there are 2 basic algorithms possible here:

  • Generate a new random number based on func(seed, coord) for each coord
  • Generate a single random number from seed, and then transform it for the coord (something like rotate(x) + translate(y) or whatever)

In the first case, you have the problem of always generating new random numbers, although this may not be as expensive as you fear.

In the second case, the problem is that you may lose randomness during your transformation operations. However, in either case the result is reproducible.

Java Drinker
+2  A: 

PRNGs can be built out of hash functions.
This is what e.g. MS Research did with parallelizing random number generation with MD5 or others with TEA on a GPU.
(In fact, PRNGs can be thought of as a hash function from (seed, state) => nextnumber.)
Generating massive amounts of random numbers on a GPU brings up similar problems.
(E.g., to make it parallel, there should not be a single shared state.)

Although it is more common in the crypto world, using a crypto hash, I have taken the liberty to use MurmurHash 2.0 for sake of speed and simplicity. It has very good statistical properties as a hash function. A related, but not identical test shows that it gives good results as a PRNG. (unless I have fsc#kd up something in the C# code, that is.:) Feel free to use any other suitable hash function; crypto ones (MD5, TEA, SHA) as well - though crypto hashes tend to be much slower.

public class LazyRandomMatrix
{
    private uint seed;

    public LazyRandomMatrix(int seed)
    {
        this.seed = (uint)seed;
    }

    public int this[int x, int y]
    {
        get
        {
            return MurmurHash2((uint)x, (uint)y, seed);
        }
    }

    static int MurmurHash2(uint key1, uint key2, uint seed)
    {
        // 'm' and 'r' are mixing constants generated offline.
        // They're not really 'magic', they just happen to work well.

        const uint m = 0x5bd1e995;
        const int r = 24;

        // Initialize the hash to a 'random' value

        uint h = seed ^ 8;

        // Mix 4 bytes at a time into the hash

        key1 *= m;
        key1 ^= key1 >> r;
        key1 *= m;

        h *= m;
        h ^= key1;

        key2 *= m;
        key2 ^= key2 >> r;
        key2 *= m;

        h *= m;
        h ^= key2;

        // Do a few final mixes of the hash to ensure the last few
        // bytes are well-incorporated.

        h ^= h >> 13;
        h *= m;
        h ^= h >> 15;

        return (int)h;
    }

}
andras
Will negative values mess up the MurmurHash2? What if I wanted it to take two floats?
Daniel Rasmussen
@Daniel: No, they won't.The link above to Murmur 2 shows that it takes a sequence of bytes. Since everything can be converted to a sequence of bytes, the original accepts everything. I just tried to simplify the original somewhat for your case. A C# implementation can be found on Murmur's wikipedia page.
andras
+1. Exactly what I was going to recommend.
Nick Johnson
+4  A: 

What you're talking about is typically called "Perlin Noise", here's a link for you: http://freespace.virgin.net/hugo.elias/models/m_perlin.htm

The most important thing in that article is the noise function in 2D:

  function Noise1(integer x, integer y)
    n = x + y * 57
    n = (n<<13) ^ n;
    return ( 1.0 - ( (n * (n * n * 15731 + 789221) + 1376312589) & 7fffffff) / 1073741824.0);    
  end function

It returns a number between -1.0 and +1.0 based on the x and y coordonates alone (and a hard coded seed that you can change randomly at the start of your app or just leave it as it is).

The rest of the article is about interpolating these numbers, but depending on how random you want these numbers, you can just leave them as it is. Note that these numbers will be utterly random. If you instead apply a Cosine Interpolator and use the generated noise every 5-6 indexes, interpolating inbetween, you get heightmap data (which is what I used it for). Skip it for totally random data.

Blindy
Yes, I'm trying to generate Perlin Noise. The examples I had seen from it all generated the Noise function in an array during construction. Are those all just magic numbers in a custom hash function?
Daniel Rasmussen
@Blindy - Your function isn't exactly what is typically called Perlin Noise. Perlin Noise involves iterating over it multiple times, with a different wavelength and amplitude. But it does exactly give the asker what he wants.
Justin L.
@Blindy - To build off what Justin said - Like you said "these numbers will be utterly random." Perlin Noise typically has filters applied to it, making it not completely random, so that it behaves like height maps or smoke distribution or something of the sort.
Daniel Rasmussen
Perlin noise shows a lot of correlation between nearby points, though. Is that what you want, Daniel, or do you want each point to be entirely independent of its neighbours?
Nick Johnson
@Justin - Sorry, I just realized you're the same person who answered my previous question. No need to point you there... *facepalm*
Daniel Rasmussen
Well yea if you go through the extra step of smoothing out the noise and adding extra amplitudes together, you get a smoother output. But the basis of Perlin Noise is the noise function, which is what i wrote above! And to answer your first comment Daniel, there is no need to hold the generated values in an array, the function will always return the same values for the same inputs.
Blindy
+2  A: 

Here is a solution based on a SHA1 hash. Basically this takes the bytes for the X, Y and Seed values and packs this into a byte array. Then a hash for the byte array and the first 4 bytes of the hash used to generate an int. This should be pretty random.

public class LazyRandomMatrix 
{
  private int _seed;
  private SHA1 _hashProvider = new SHA1CryptoServiceProvider();

  public LazyRandomMatrix(int seed)
  {
    _seed = seed;
  }

  public int this[int x, int y]
  {
    get
    {
      byte[] data = new byte[12];
      Buffer.BlockCopy(BitConverter.GetBytes(x), 0, data, 0, 4);
      Buffer.BlockCopy(BitConverter.GetBytes(y), 0, data, 4, 4);
      Buffer.BlockCopy(BitConverter.GetBytes(_seed), 0, data, 8, 4);

      byte[] hash = _hashProvider.ComputeHash(data);
      return BitConverter.ToInt32(hash, 0);
    }
  }     
}
Chris Taylor
More than 'pretty' random. :)
Nick Johnson
+1  A: 

You want a random number generator with random access to the elements, instead of sequential access. (Note that you can reduce your two coordinates into a single index i.e. by computing i = x + (y << 16).)

A cool example of such a generator is Blum Blum Shub, which is a cryptographically secure PRNG with easy random-access. Unfortunately, it is very slow.

A more practical example is the well-known linear congruential generator. You can easily modify one to allow random access. Consider the definition:

X(0) = S
X(n) = B + X(n-1)*A (mod M)

Evaluating this directly would take O(n) time (that's pseudo linear, not linear), but you can convert to a non-recursive form:

//Expand a few times to see the pattern:
X(n) = B + X(n-1)*A (mod M)
X(n) = B + (B + X(n-2)*A)*A (mod M)
X(n) = B + (B + (B + X(n-3)*A)*A)*A (mod M)
//Aha! I see it now, and I can reduce it to a closed form:
X(n) = B + B*A + B*A*A + ... + B*A^(N-1) + S*A^N (mod M)
X(n) = S*A^N + B*SUM[i:0..n-1](A^i) (mod M)
X(n) = S*A^N + B*(A^N-1)/(A-1) (mod M)

That last equation can be computed relatively quickly, although the second part of it is a bit tricky to get right (because division doesn't distribute over mod the same way addition and multiplication do).

Strilanc
+1 for the excellent explanation of linear generator and expanding it to a non-recursive form.
Daniel Rasmussen