views:

261

answers:

2

I have below a function (from a previous question that went unanswered) that creates an array with n amount of values. The sum of the array is equal to $max.

function randomDistinctPartition($n, $max) {
  $partition= array();
  for ($i = 1; $i < $n; $i++) {
    $maxSingleNumber = $max - $n;
    $partition[] = $number = rand(1, $maxSingleNumber);
    $max -= $number;
  }
  $partition[] = $max;
  return $partition;
}

For example: If I set $n = 4 and $max = 30. Then I should get the following.

array(5, 7, 10, 8);

However, this function does not take into account duplicates and 0s. What I would like - and have been trying to accomplish - is to generate an array with unique numbers that add up to my predetermined variable $max. No Duplicate numbers and No 0 and/or negative integers.

+10  A: 

Ok, this problem actually revolves around linear sequences. With a minimum value of 1 consider the sequence:

f(n) = 1 + 2 + ... + n - 1 + n

The sum of such a sequence is equal to:

f(n) = n * (n + 1) / 2

so for n = 4, as an example, the sum is 10. That means if you're selecting 4 different numbers the minimum total with no zeroes and no negatives is 10. Now go in reverse: if you have a total of 10 and 4 numbers then there is only one combination of (1,2,3,4).

So first you need to check if your total is at least as high as this lower bound. If it is less there is no combination. If it is equal, there is precisely one combination. If it is higher it gets more complicated.

Now imagine your constraints are a total of 12 with 4 numbers. We've established that f(4) = 10. But what if the first (lowest) number is 2?

2 + 3 + 4 + 5 = 14

So the first number can't be higher than 1. You know your first number. Now you generate a sequence of 3 numbers with a total of 11 (being 12 - 1).

1 + 2 + 3 = 6
2 + 3 + 4 = 9
3 + 4 + 5 = 12

The second number has to be 2 because it can't be one. It can't be 3 because the minimum sum of three numbers starting with 3 is 12 and we have to add to 11.

Now we find two numbers that add up to 9 (12 - 1 - 2) with 3 being the lowest possible.

3 + 4 = 7
4 + 5 = 9

The third number can be 3 or 4. With the third number found the last is fixed. The two possible combinations are:

1, 2, 3, 6
1, 2, 4, 5

You can turn this into a general algorithm. Consider this recursive implementation:

$all = all_sequences(14, 4);
echo "\nAll sequences:\n\n";
foreach ($all as $arr) {
  echo implode(', ', $arr) . "\n";
}

function all_sequences($total, $num, $start = 1) {
  if ($num == 1) {
    return array($total);
  }
  $max = lowest_maximum($start, $num);
  $limit = (int)(($total - $max) / $num) + $start;
  $ret = array();
  if ($num == 2) {
    for ($i = $start; $i <= $limit; $i++) {
      $ret[] = array($i, $total - $i);
    }
  } else {
    for ($i = $start; $i <= $limit; $i++) {
      $sub = all_sequences($total - $i, $num - 1, $i + 1);
      foreach ($sub as $arr) {
        array_unshift($arr, $i);
        $ret[] = $arr;
      }
    }
  }
  return $ret;
}

function lowest_maximum($start, $num) {
  return sum_linear($num) + ($start - 1) * $num;
}

function sum_linear($num) {
  return ($num + 1) * $num / 2;
}

Output:

All sequences:

1, 2, 3, 8
1, 2, 4, 7
1, 2, 5, 6
1, 3, 4, 6
2, 3, 4, 5

One implementation of this would be to get all the sequences and select one at random. This has the advantage of equally weighting all possible combinations, which may or may not be useful or necessary to what you're doing.

That will become unwieldy with large totals or large numbers of elements, in which case the above algorithm can be modified to return a random element in the range from $start to $limit instead of every value.

cletus
@Russell check out the code sample.
cletus
Thank you cletus. Its a shame I can only upvote you once.
Russell Dias
If i read correctly, this generates a semirandomly distributed series without duplicates. I think if it is tested by plotting the values of many generated sequences, dips/lumps in their distribution could appear. Ive hacked out an answer which shouldnt do that, but admitedly may change values when none actualy need to change.
strainer
@strainer not sure I fully follow your comment but my idea/code here creates all possible sequences given the constraints from which you can randomly pick one. That is by definition equally weighted and thus fair. Replacing the inner loop with something that picks a random value at each step is not equally weighted. But equal weighting may not be a necessary property for the OP.
cletus
@cletus, sorry youre right that comment was silly. I understand your code now. I do think your distribution will differ from the solution i found, which sort of preserves the distribution tendencies of an input sequence. And your solution becomes rather hefty beyond small lengths and sizes, what sort of sizes/totals would require a problematic memory/load? Just the minimal example was given. When larger sizes or low load are required, how do you suggest replacing duplicate values? ..i am of course just moochin' ive hardly any points! :D
strainer
+2  A: 

I would use 'area under triangle' formula... like cletus(!?) Im really gonna have to start paying more attention to things...

Anyway, i think this solution is pretty elegant now, it applies the desired minimum spacing between all elements, evenly, scales the gaps (distribution) evenly to maintain the original sum and does the job non-recursively (except for the sort):

Given an array a() of random numbers of length n

Generate a sort index s()

and work on the sorted intervals a(s(0))-a(s(1)), a(s(1))-a(s(2)) etc

  1. increase each interval by the desired minimum separation size eg 1 (this necessarily warps their 'randomness')

  2. decrease each interval by a factor calculated to restore the series sum to what it is without the added spacing.

If we add 1 to each of a series we increase the series sum by 1 * len

1 added to each of series intervals increases sum by: len*(len+1)/2 //( ?pascal's triangle )

Draft code:

$series($length);        //the input sequence 
$seriesum=sum($series);  //its sum
$minsepa=1;              //minimum separation

$sorti=sort_index_of($series) //sorted index - php haz function?

$sepsum=$minsepa*($length*($length+1))/2; 
//sum of extra separation

$unsepfactor100=($seriesum*100)/($seriesum+sepsum); 
//scale factor for original separation to maintain size
//(*100~ for integer arithmetic)

$px=series($sorti(0)); //for loop needs the value of prev serie

for($x=1 ; $x < length; $x++)
{ $tx=$series($sorti($x));             //val of serie to
  $series($sorti($x))= ($minsepa*$x)   //adjust relative to prev
                     + $px 
                     + (($tx-$px)*$unsepfactor100)/100; 

  $px=$tx;                             //store for next iteration 
  }
  • all intervals are reduced by a constant (non-random-warping-factor)
  • separation can be set to values other than one
  • implementantions need to be carefuly tweaked (i usualy test&'calibrate')
    to accomodate rounding errors. Probably scale everything up by ~15 then back down after. Intervals should survive if done right.

After sort index is generated, shuffle the order of indexes to duplicate values to avoid runs in the sequence of collided series. ( or just shuffle final output if order never mattered )

Shuffle indexes of dupes:

for($x=1; $x<$len; $x++)                   
{ if ($series($srt($x))==$series($srt($x-1)))
  { if( random(0,1) ) 
    { $sw= $srt($x);
      $srt($x)= $srt($x-1);
      $srt($x-1)= $sw;
  } } } 

A kind of minimal disturbance can be done to a 'random sequence' by just parting dupes by the minimum required, rather than moving them more than minimum -some 'random' amount that was sought by the question.

The code here separates every element by the min separation, whether duplicate or not, that should be kindof evenhanded, but overdone maybe. The code could be modified to only separate the dupes by looking through the series(sorti(n0:n1..len)) for them and calculating sepsum as +=minsep*(len-n) for each dupe. Then the adjustment loop just has to test again for dupe before applying adjustment.

strainer
What an unusual code style.
Gumbo
Until i start scoring points, ill have to take that as hint that it needs normalisation :]
strainer
i added many $ to please php eyes.
strainer