views:

761

answers:

10

I'm looking for a method to generate a pseudorandom stream with a somewhat odd property - I want clumps of nearby numbers.

The tricky part is, I can only keep a limited amount of state no matter how large the range is. There are algorithms that give a sequence of results with minimal state (linear congruence?)

Clumping means that there's a higher probability that the next number will be close rather than far.

Example of a desirable sequence (mod 10): 1 3 9 8 2 7 5 6 4
I suspect this would be more obvious with a larger stream, but difficult to enter by hand.

Update:
I don't understand why it's impossible, but yes, I am looking for, as Welbog summarized:

  • Non-repeating
  • Non-Tracking
  • "Clumped"
A: 

By defining the "clumping features" in terms of the probability distribution of its size, and the probability distribution of its range, you can then use simple random generators with the underlying distribution and produce the sequences.

mjv
This sounds promising. can you point me to information on how to do that? If I use multiple generators, I was worried about overlap (I can't track what numbers I've already hit)
@OldcodeOrder: now I see the challenge. I thought the reference to 'state' in the original question was regarding the necessary state to respect the statistical shape of the clumps. Now... if you need only unique numbers, a single generator would be preferable, relying on its own math to ensure uniqueness...
mjv
A: 

Perhaps you could generate a random sequence, and then do some strategic element swapping to get the desired property.

For example, if you find 3 values a,b,c in the sequence such that a>b and a>c, then with some probability you could swap elements a and b or elements a and c.

EDIT in response to comment:

Yes, you could have a buffer on the stream that is whatever size you are comfortable with. Your swapping rules could be deterministic, or based on another known, reproducible psuedo-random sequence.

mobrule
This needs to be a stream, so I don't get to see the entire sequence. This almost seems like buffering a small number of values - I'm not sure if that'd produce the right behavior, or how it'd be repeatable
A: 
Mark Rushakoff
To amplify, over long periods most (okay, all, I'll leap!) normal generators are going to have repeats, since near their tails, they poorly approximate uniformity and have discontinuity. Making it non-repeating (globally) and non-tracking is impossible.
Gregg Lind
A: 

I don't know of an existing algorithm that would do this, but it doesn't seem difficult to roll your own (depending on how stringent the "limited amount of state" requirement is). For example:

RANGE = (1..1000)
CLUMP_ODDS = .5
CLUMP_DIST = 10

last = rand(RANGE)
while still_want_numbers
  if rand(CLUMP_ODDS)   # clump!
    next = last + rand(CLUMP_DIST) - (CLUMP_DIST / 2)  # do some boundary checking here
  else   # don't clump!
    next = rand(RANGE)
  end
  print next
  last = next
end

It's a little rudimentary, but would something like that suit your needs?

John Hyland
I'm looking for something a bit more complex, since this would require tracking to avoid repeats, which I don't have state for.
Ah! I missed the part where you didn't want to repeat any of the numbers. You're right, then, this won't work.
John Hyland
A: 

How about (psuedo code)

// clumpiness static in that value retained between calls
static float clumpiness = 0.0f; // from 0 to 1.0        
method getNextvalue(int lastValue)
   float r = rand();  // float from 0 to 1

   int change = MAXCHANGE * (r - 0.5) * (1 - clumpiness); 

   clumpiness += 0.1 * rand() ;
   if (clumpiness >= 1.0) clumpiness -= 1.0;
   // -----------------------------------------
   return Round(lastValue + change);
Charles Bretana
A: 

In the range [0, 10] the following should give a uniform distribution. random() yields a (pseudo) random number r with 0 <= r < 1.

x(n + 1) = (x(n) + 5 * (2 * random() - 1)) mod 10

You can get your desired behavior by delinearizing random() - for example random()^k will be skewed towards small numbers for k > 1. An possible function could be the following, but you will have to try some exponents to find your desired distribution. And keep the exponent odd, if you use the following function... ;)

x(n + 1) = (x(n) + 5 * (2 * random() - 1)^3) mod 10
Daniel Brückner
This looks promising, but when i threw something together, it ends up having repeats.
I know - when I started writing my answer you had not yet specified that the sequence must be non-repeating.
Daniel Brückner
+4  A: 

Cascade a few LFSRs with periods smaller than you need, combining them to get a result such than the fastest changing register controls the least significant values. So if you have L1 with period 3, L2 with period 15 and L3 with some larger period, N = L1(n) + 3 * L2(n/3) + 45 * L3(n/45). This will obviously generate 3 clumped values, then jump and general another 3 clumped values. Use something other than multiplication ( such as mixing some of the bits of the higher period registers ) or different periods to make the clump spread wider than the period of the first register. It won't be particularly smoothly random, but it will be clumpy and non-repeating.

Pete Kirkham
If I just concatenate the bits, what precludes the repeat?
If each LFSR is cascaded, so the next one only gets incremented when the previous one wraps round, then the repeat period for the combination will be the product of the periods of the other registers.
Pete Kirkham
A: 

Does a sequence like 0, 94, 5, 1, 3, 4, 14, 8, 10, 9, 11, 6, 12, 7, 16, 15, 17, 19, 22, 21, 20, 13, 18, 25, 24, 26, 29, 28, 31, 23, 36, 27, 42, 41, 30, 33, 34, 37, 35, 32, 39, 47, 44, 46, 40, 38, 50, 43, 45, 48, 52, 49, 55, 54, 57, 56, 64, 51, 60, 53, 59, 62, 61, 69, 68, 63, 58, 65, 71, 70, 66, 73, 67, 72, 79, 74, 81, 77, 76, 75, 78, 83, 82, 85, 80, 87, 84, 90, 89, 86, 96, 93, 98, 88, 92, 99, 95, 97, 2, 91 (mod 100) look good to you?

This is the output of a small ruby program (explanations below):

#!/usr/bin/env ruby

require 'digest/md5'

$seed = 'Kind of a password'
$n = 100 # size of sequence
$k = 10  # mixing factor (higher means less clumping)
def pseudo_random_bit(k, n)
  Digest::MD5.hexdigest($seed + "#{k}|#{n}")[-1] & 1
end

def sequence(x)
  h = $n/2
  $k.times do |k|
    # maybe exchange 1st with 2nd, 3rd with 4th, etc
    x ^= pseudo_random_bit(k, x >> 1) if x < 2*h
    # maybe exchange 1st with last
    if [0, $n-1].include? x
      x ^= ($n-1)*pseudo_random_bit(k, 2*h)
    end
    # move 1st to end
    x = (x - 1) % $n
    # maybe exchange 1st with 2nd, 3rd with 4th, etc
    # (corresponds to 2nd with 3rd, 4th with 5th, etc)
    x ^= pseudo_random_bit(k, h+(x >> 1)) if x < 2*(($n-1)/2)
    # move 1st to front
    x = (x + 1) % $n
  end
  x
end

puts (0..99).map {|x| sequence(x)}.join(', ')

The idea is basically to start with the sequence 0..n-1 and disturb the order by passing k times over the sequence (more passes means less clumping). In each pass one first looks at the pairs of numbers at positions 0 and 1, 2 and 3, 4 and 5 etc (general: 2i and 2i+1) and flips a coin for each pair. Heads (=1) means exchange the numbers in the pair, tails (=0) means don't exchange them. Then one does the same for the pairs at positions 1 and 2, 3 and 4, etc (general: 2i+1 and 2i+2). As you mentioned that your sequence is mod 10, I additionally exchanged positions 0 and n-1 if the coin for this pair dictates it.

A single number x can be mapped modulo n after k passes to any number of the interval [x-k, x+k] and is approximately binomial distributed around x. Pairs (x, x+1) of numbers are not independently modified.

As pseudo-random generator I used only the last of the 128 output bits of the hash function MD5, choose whatever function you want instead. Thanks to the clumping one won't get a "secure" (= unpredictable) random sequence.

jug
If the sequence looks too monotone increasing: it is also possible to exchange blocks of numbers to have the sequence jump around more.
jug
A: 

For the record, I'm in the "non-repeating, non-random, non-tracking is a lethal combination" camp, and I hope some simple though experiments will shed some light. This is not formal proof by any means. Perhaps someone will shore it up.

So, I can generate a sequence that has some randomness easily:

Given x_i, x_(i+1) ~ U(x_i, r), where r > x_i.

For example:

if x_i = 6, x_(i+1) is random choice from (6+epsilon, some_other_real>6). This guarantees non-repeating, but at the cost that the distribution is monotonically increasing.

Without some condition (like monotonicity), inherent to the sequence of generated numbers themselves, how else can you guarantee uniqueness without carrying state?

Edit: So after researching RBarryYoung's claim of "Linear Congruential Generators" (not differentiators... is this what RBY meant), and clearly, I was wrong! These sequences exist, and by necessity, any PRNG whose next number is dependent only on the current number and some global, non changing state can't have repeats within a cycle (after some initial burn it period).

Gregg Lind
Linear Congruential Differentiators fulfill all of the requirements, except clumping, with no problem.
RBarryYoung
Yeah, "LCG"s oops.
RBarryYoung
A: 

Maybe you can chain together 2 or more LCGs in a similar manner described for the LSFRs described here. Incement the least-significant LCG with its seed, on full-cycle, increment the next LCG. You only need to store a seed for each LCG. You could then weight each part and sum the parts together. To avoid repititions in the 'clumped' LstSig part you can randomly reseed the LCG on each full cycle.

andora