views:

579

answers:

7

I have the following array which contains arrays of values:

$array = array(
    array('1', '2'),
    array('a', 'b', 'c'),
    array('x', 'y'),
);

There can be any number of arrays and an array can contain any number of values. I currently have a piece of code which will generate all combinations where one value is taken from each array. eg:

1ax, 1ay, 1bx, 1by, 1cx, 1cy, 2ax, 2ay, 2bx, 2by, 2cx, 2cy

However what I actually want is only the combinations where only one value sits in each column, ie. 1ax is no good because all three values 1, a and x sit in the first column, 1by is no good because b and y sit in the second column. So from the above example only these combinations would be valid:

1cy, 2cx

I originally planned to just generate all combinations and then filter out the ones with conflicts, but that doesn't scale as this is an oversimplified example, in the real application there's going to be situations where there are potentially millions of combinations (including conflicting ones).

Can anyone help with a better way to solve this? I'm working in PHP, but any code sample that clearly demonstrates the logic would be helpful.

Thanks in advance.


Update:

I've tested the solutions that work against a bigger dataset, to get some benchmarks, these are the results so far:

$array = array(
    array('1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3'),
    array('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'),
    array('x', 'y', 'z', 'x', 'y', 'z', 'x', 'y', 'z'),
    array('1', '2', '3', '1', '2', '3', '1', '2', '3'),
    array('a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd'),
    array('x', 'y', 'z'),
);

Josh Davis 2nd Solution:

Combinations:      249480
Time:              0.3180251121521 secs
Memory Usage:      22.012168884277 mb
Peak Memory Usage: 22.03059387207 mb

Josh Davis:

Combinations:      249480
Time:              1.1172790527344 secs
Memory Usage:      22.004837036133 mb
Peak Memory Usage: 22.017387390137 mb

Tom Haigh:

Combinations:      249480
Time:              5.7098741531372 secs
Memory Usage:      39.145843505859 mb
Peak Memory Usage: 39.145843505859 mb
+1  A: 

probably not the most elegant way, but does the trick (javascript)

var result = [];

for(i=0;i<arr1.length;i++)
{
  for(j=0;j<arr2.length;j++)
  {
    if(j==i)
      continue;
    else
    {
      for(k=0;k<arr3.length;k++)
      {
        if(k==i||k==j)
          continue;
        else
        {
          result.push(arr1[i]+arr2[j]+arr3[k]);
        }
      }
    }
  }
}
gabtub
That only works with a predefined number of arrays (3) but there can be any number of arrays.
Jack Sleight
+1  A: 

This can be refactored using recursion making it work with any arbitrary amount of arrays. If I find the time, I'll give it a try myself.

ps. I don't know php, the example is written in Delphi.

Edit: recursive solution with arbitrary # arrays

type
  TSingleArray = array of string;
  TMasterArray = array of TSingleArray;
var
  solutions: array of integer; // Q&D container to hold currently used indexes of SingleArrays


procedure WriteSolution(const masterArray: TMasterArray);
var
  I: Integer;
  indexes: string;
  solution: string;
begin
  for I := 0 to High(solutions) do
  begin
    indexes := indexes + IntToStr(solutions[I]) + ' ';
    solution := solution + masterArray[I][solutions[I]];
  end;
  Writeln('Solution: ' + solution + ' Using indexes: ' + indexes);
end;

procedure FindSolution(const masterArray: TMasterArray; const singleArrayIndex: Integer; var bits: Integer);
var
  I: Integer;
begin
  for I := 0 to High(masterArray[singleArrayIndex]) do
  begin
    //***** Use bit manipulation to check if current index is already in use
    if bits and (1 shl I)  = (1 shl I ) then continue;
    solutions[singleArrayIndex] := I;
    Inc(bits, 1 shl I);
    //***** If it is not the last array in our masterArray, continue by calling RecursArrays recursivly.
    if singleArrayIndex <> High(masterArray) then FindSolution(masterArray, Succ(singleArrayIndex), bits)
    else WriteSolution(masterArray);
    Dec(bits, 1 shl I);
  end;
end;

//***************
// Initialization
//***************
var
  I, J: Integer;
  bits: Integer;
  singleArrayString: string;
  masterArray: TMasterArray;
begin
  I := 10;
  SetLength(masterArray, I);
  for I := 0 to High(masterArray) do
  begin
    SetLength(masterArray[I], High(masterArray) + Succ(I));
    singleArrayString := EmptyStr;
    for J := 0 to High(masterArray[I]) do
    begin
      masterArray[I][J] := IntToStr(J);
      singleArrayString := singleArrayString + masterArray[I][J];
    end;
    WriteLn(singleArrayString)
  end;
  ReadLn;

  //****** Start solving the problem using recursion
  bits := 0;
  SetLength(solutions, Succ(High(masterArray)));
  FindSolution(masterArray, 0, bits);    
end.
Lieven
It would be great to see how it could be refactored, it's the variable number of arrays that I'm struggling most with.
Jack Sleight
Thanks, I'll try to give rewriting that in PHP a go, although I'm not at all familiar with Delphi, so might take a while :-) . Thanks again.
Jack Sleight
A: 

Your problem is similar to that of finding a determinant of a matrix. The best way imho would be to fill smaller arrays with some symbol, say '0', so they all have equal number of values, in your example

$array = array(
    array('1', '2', '0'),
    array('a', 'b', 'c'),
    array('x', 'y', '0'),
);

then loop through each of the first array values and for each increment the index of array by 1 and check the next array and the next column (on the first loop it will be '1' and index will be 0 incremented - 1, then get $array1 - 'b' and so on) if you reach '0', break, if you reach right border, reset first index to 0. Then do the same with decrementing and you'll have all the combinations. It is probably unclear, check the image I linked to

roddik
sorry that will work only with 3x3 array's :(
roddik
+2  A: 

I think this works. It is using recursion to go over the structure like a tree. For each branch it keeps track of which columns are already taken. It is probably slow because it is a brute force approach.

<?php 

$array = array(
    array('1', '2'),
    array('a', 'b', 'c'),
    array('x', 'y'),
);


function f($array, & $result, $colsToIgnore = array(), $currentPath = array()) {
    $row = array_shift($array);
    $length = count($row);
    $isMoreRows = !! count($array);

    for ($col = 0; $col < $length; $col++) {
        //check whether column has already been used
        if (in_array($col, $colsToIgnore)) {
            continue;   
        }

        $item = $row[$col];

        $tmpPath = $currentPath;
        $tmpPath[] = $item;

        if ($isMoreRows) {
            $tmpIgnoreCols = $colsToIgnore;
            $tmpIgnoreCols[] = $col;
            f($array, $result, $tmpIgnoreCols, $tmpPath);
        } else {
            $result[] = implode('', $tmpPath);
        }

    }
}


$result = array();
f($array, $result);
print_r($result);
Tom Haigh
+4  A: 

Interesting problem! This turned out to be more complex than I thought, but it seems to work.

Basic strategy is to first order the arrays smallest to largest (keeping track of what order they were in, so I can output the answers in the correct order).

I keep answers in the form of an array of indexes into this sorted array of input lists.

Now that the lists are sorted, I can store the first correct answer as array(0,1,2,...,n);

Then I recurse into a function for trying all values in the first slot there (the 0 above) by swapping it with other values in that answer array (all the ones that aren't too big for that slot). Since I've got it sorted by size, I can move any value to the right when I'm swapping, without worrying about it being to big for that right slot.

outputting each valid slot has some crazy indirection to undo all the sorting.

Sorry if this looks confusing. I didn't spend much time cleaning it up.

<?php
# $lists is an array of arrays
function noconfcombos($lists) {
    $lengths = array();
    foreach($lists as $list) {
        $lengths[] = count($list);
    }

    # find one solution (and make sure there is one)
    $answer = array();
    $sorted_lengths = $lengths;
    asort($sorted_lengths);
    $answer_order_lists = array();
    $answer_order_lengths = array();
    $output_order = array();
    $min = 1;
    $max_list_length = 0;
    foreach($sorted_lengths as $lists_key => $list_max) {
        if($list_max < $min) {
            # no possible combos
            return array();
        }
        $answer[] = $min - 1; # min-1 is lowest possible value (handing out colums starting with smallest rows)
        $output_order[$lists_key] = $min - 1; # min-1 is which slot in $answers corresponds to this list
        $answer_order_lists[] = $lists[$lists_key];
        $answer_order_lengths[] = $lengths[$lists_key];
        ++$min;
    }
    ksort($output_order);
    $number_of_lists = count($lists);
    $max_list_length = end($sorted_lengths);
    if($max_list_length > $number_of_lists) {
       for($i = $number_of_lists; $i < $max_list_length; ++$i) {
          $answer[] = $i;
       }
       $stop_at = $number_of_lists;
    } else {
       $stop_at = $number_of_lists - 1;
    }

    # now $answer is valid (it has the keys into the arrays in $list for the
    # answer), and we can find the others by swapping around the values in
    # $answer.

    $ret = array();
    $ret[] = noconfcombos_convert($answer, $answer_order_lists, $output_order);
    noconfcombos_recurse($ret, $max_list_length, $stop_at, $answer_order_lengths, $answer_order_lists, $output_order, $answer, 0);

    return $ret;
}

# try swapping in different indexes into position $index, from the positions
# higher, then recurse
function noconfcombos_recurse(&$ret, $max_list_length, $stop_at, &$lengths, &$lists, &$output_order, $answer, $index) {
    if($index < $stop_at) {
        noconfcombos_recurse($ret, $max_list_length, $stop_at, $lengths, $lists, $output_order, $answer, $index + 1);
    }
    for($other = $index + 1; $other < $max_list_length; ++$other) {
        if($answer[$other] < $lengths[$index]) { # && $answer[$index] < $lengths[$other]) {
            $tmp = $answer[$index];
            $answer[$index] = $answer[$other];
            $answer[$other] = $tmp;
            $ret[] = noconfcombos_convert($answer, $lists, $output_order);
            if($index < $stop_at) {
                noconfcombos_recurse($ret, $max_list_length, $stop_at, $lengths, $lists, $output_order, $answer, $index + 1);
            }
        }
    }
}


function noconfcombos_convert(&$indexes, &$lists, &$order) {
    $ret = '';
    foreach($order as $i) {
        $ret .= $lists[$i][$indexes[$i]];
    }
    return $ret;
}

function noconfcombos_test() {
    $a = array('1', '2', '3', '4');
    $b = array('a', 'b', 'c', 'd', 'e');
    $c = array('x', 'y', 'z');
    $all = array($a, $b, $c);
    print_r(noconfcombos($all));
}

noconfcombos_test();
JasonWoof
When I run this with this data [[1, 2, 3], [a, b, c, d], [x, y, z]] I only get 6 combinations (there should be 12) and two of those are duplicates: 1cy, 1bz, 3bx, 3ay, 1cy, 1bz
Jack Sleight
JasonWoof
ok, I'm going to edit my post with new code that fixes both bugs.
JasonWoof
ok, so to make it handle more colums than rows, I made the $anwsers variable hold more slots than are actually output. I also had to modify the recursion code slightly, so that if there were extra slots like this, it would recurse one further (because it needs to swap the last place with the extras, whereas before there was nothing left to swap that with.)
JasonWoof
+1  A: 

Look at it from a different angle: in order to compose a result row, you need to pick a value for each column. Each value should be picked from a different source row. The problem is know as "pick N out of M", or more mathematically, a Combination.

This means that a result row corresponds to an array of source-row indices.

You can build all possible pickings by starting to build an index-array like this (pseudo-code)

function combinations( $source ) {
  if( count( $source ) == 0 ) return $source;
  $result=array();
  // build one row
  foreach( $source as $index=>$value ) {
    $newsource = array_splice( $source, $index, 1 );

    $reduced_combinations=combinations( $newsource  );
    foreach( $reduced_combinations as $reduced_combi ) {
      $newrow=array_unshift( $reduced_combi, $value );
      $result[]=$newrow;
    }

  }
  return $result;
}

function retrieve_indices( $indices, $arrays ) {
   $result=array();
   foreach( $indices as $column=>$index ) {
     $result[]=$arrays[$index][$column];
   }
   return $result;
}

$source_arrays = array(
  array( "1", "2", "3" ),
  array( "a", "b", "c" ),
  array( "x", "y", "z" )
);

$index_combinations = combinations( range(0,2) );
$result=array();
foreach( $index_combinations as $combination ) {
  $result[]=retrieve_indices( $combination, $source_arrays );
}
xtofl
I'm getting "Fatal error: Call to undefined function remove() ... on line 8" with this?
Jack Sleight
You're right. I created the `$newsources` for that. Forgot to use it :)
xtofl
Thanks for your help, although now it seems to recurse infinitely... ?
Jack Sleight
Indeed. I forgot to check for the trivial case of the recursion.
xtofl
The code was intended as to be a conceptual solution, illustrating the idea. I admit that's why I didn't test it.
xtofl
Ah OK, yeah you did say "pseudo-code", I missed that, sorry. Thanks for the help, I'll see if I can get something like that working.
Jack Sleight
+3  A: 

This is one of those cases where self-generated code and brute force will beat most algorithms on simplicity and performance. In previous answers I've seen lots of recursion, array manipulation and computations, when in actuality what you'd want to do is:

foreach ($array[0] as $k0 => $v0)
{
    foreach ($array[1] as $k1 => $v1)
    {
        if ($k1 == $k0)
        {
            continue;
        }
        foreach ($array[2] as $k2 => $v2)
        {
            if ($k2 == $k1 || $k2 == $k0)
            {
                continue;
            }
            $result[] = $v0.$v1.$v2;
        }
    }
}

Of course, you cannot write this unless you know how many arrays are in $array. That's where generated code comes handy:

$array = array(
    array('1', '2'),
    array('a', 'b', 'c'),
    array('x', 'y')
);
$result = array();

$php = '';
foreach ($array as $i => $arr)
{
    $php .= 'foreach ($array[' . $i . '] as $k' . $i . ' => $v' . $i . '){';

    if ($i > 0)
    {
        $sep  = 'if (';
        $j    = $i - 1;
        do
        {
            $php .= $sep . '$k' . $i . ' == $k' . $j;
            $sep  = ' || ';
        }
        while (--$j >= 0);

        $php .= ') { continue; } ';
    }
}

$php .= '$result[] = $v' . implode('.$v', array_keys($array)) . ';' . str_repeat('}', count($array));

eval($php);
print_r($result);

Note that this routine assumes that $array is a zero-based numerically indexed array, as in your example. It will generate the code quoted above, adjusted for an arbitrary number of arrays.


Update

Here's an alternative algorithm. It's still self-generated, but less bruteforce. We still have nested loops, except that each loop works on a copy of the array where keys that are currently used by outer loops have been removed from this loop's array. For example, if the values should be (a,b,c) but the outer loops are using the indices 0 and 2, we remove "a" (index 0) and "c" (index 2) and all we have left is "b". It means that loops work only on possible combinations and we don't need a if condition anymore.

Furthermore, and this part can be applied to the previous algorithm as well, we process the arrays of values in order from the smallest to the largest so that we guarantee that used indices exist in current array. The downside is it doesn't generate the combinations in the same order. It generates the same combinations, just not in the same order. The code looks like this:

$a0 = $array[0];
foreach ($a0 as $k0 => $v0)
{
    $a2 = $array[2];
    unset($a2[$k0]);
    foreach ($a2 as $k2 => $v2)
    {
        $a1 = $array[1];
        unset($a1[$k0], $a1[$k2]);
        foreach ($a1 as $k1 => $v1)
        {
            $result[] = "$v0$v1$v2";
        }
    }
}

The above routine sets up a copy of the values at the beginning of every loop, then removes values that are used by outer loops. You can improve this process by setting up a copy of the values only once at the beginning, remove the keys as they are used (at the beginning of each loop) and put them back as they are freed (at the end of each loop). The routine then looks like this:

list($a0,$a1,$a2) = $array;
foreach ($a0 as $k0 => $v0)
{
    unset($a1[$k0], $a2[$k0]);
    foreach ($a2 as $k2 => $v2)
    {
        unset($a1[$k2]);
        foreach ($a1 as $k1 => $v1)
        {
            $result[] = "$v0$v1$v2";
        }
        $a1[$k2] = $array[1][$k2];
    }
    $a1[$k0] = $array[1][$k0];
    $a2[$k0] = $array[2][$k0];
}

The actual code that generates the source above is:

$keys = array_map('count', $array);
arsort($keys);

$inner_keys = array();
foreach ($keys as $k => $cnt)
{
    $keys[$k] = $inner_keys;
    $inner_keys[] = $k;
}

$result = array();

$php = 'list($a' . implode(',$a', array_keys($array)) . ')=$array;';
foreach (array_reverse($keys, true) as $i => $inner_keys)
{
    $php .= 'foreach ($a' . $i . ' as $k' . $i . ' => $v' . $i . '){';

    if ($inner_keys)
    {
        $php .= 'unset($a' . implode('[$k' . $i . '],$a', $inner_keys) . '[$k' . $i . ']);';
    }
}

$php .= '$result[] = "$v' . implode('$v', array_keys($array)) . '";';

foreach ($keys as $i => $inner_keys)
{
    foreach ($inner_keys as $j)
    {
        $php .= '$a' . $j . '[$k' . $i . ']=$array[' . $j . '][$k' . $i . "];\n";
    }
    $php .= '}';
}
eval($php);
Josh Davis
+1. That looks promising but is probably (also) to slow in practice for any reasonable amount of arrays(item)s. The *compiled* code I've used is currently evaluating 10 arrays with lengths from 10-19. There are already 234.783.155 solutions found without incrementing the index of the first array. *There has to be a better way than brute force going over all combinations.*
Lieven
In PHP, you'd have a hard time beating that kind of algorithm, as it doesn't use any functions and only relies on language constructs. If I'm not wrong, 10 arrays of length 10 to 19 would generate 10 billions results, so you'd need more than 100GB of RAM to hold them in memory.
Josh Davis
You are probably right about the performance, I don't know PHP. You are very much right about the memory consumption, I didn't thougt about that. I'm curious, how did you calculate the 10 billion results?
Lieven
If the first array has 10 elements and the last one has 19 elements, 10 arrays from 10 to 19 you have 10 keys * (11 keys - 1 used key) * (12 keys - 2 used keys) * etc... You end up with 10^10 combinations.
Josh Davis
Awesome, thanks for the alternative solution. At work right now so will give it a test later, looks like this is going to be the best solution.
Jack Sleight