views:

694

answers:

4

I have an unknown number of arrays, each containing an unknown number of words. I want to concatenate the values from each list so that all possible variations of the words are stored to a final array.

For example, if array 1 contains:

dog
cat

and array 2 contains:

food
tooth

and array 3 contains:

car
bike

I'd like the output to be:

dog food car
dog food bike
dog tooth car
dog tooth bike
cat food car
cat food bike
cat tooth car
cat tooth bike

There could be more than 3 lists, and each list will most likely have more than 2 words.

I'd like to do this in PHP.

I know how to do it if I know the number of lists, though it's probably not the most resource efficient method. But nested foreach loops works if you know the number of arrays. What if you don't? And what are some methods to solve this problem that will still work if, let's say, there are 100 arrays of 100 words each. Or 1000?

Thanks!

+5  A: 

You can put all word arrays into one array and use a recursive function like this:

function concat(array $array) {
    $current = array_shift($array);
    if(count($array) > 0) {
        $results = array();
        $temp = concat($array);
        foreach($current as $word) {
          foreach($temp as $value) {
            $results[] =  $word . ' ' . $value;
          }
        }
        return $results;           
    }
    else {
       return $current;
    }
}

$a = array(array('dog', 'cat'), array('food', 'tooth'), array('car', 'bike'));

print_r(concat($a));

Which returns:

Array
(
    [0] => dog food car
    [1] => dog food bike
    [2] => dog tooth car
    [3] => dog tooth bike
    [4] => cat food car
    [5] => cat food bike
    [6] => cat tooth car
    [7] => cat tooth bike
)

But I guess this behaves badly for large arrays as the output array will be very big.


To get around this, you can output the combinations directly, using a similar approach:

function concat(array $array, $concat = '') {
    $current = array_shift($array);

    $current_strings = array();

    foreach($current as $word) {
            $current_strings[] = $concat . ' ' . $word;
    }

    if(count($array) > 0) {
        foreach($current_strings as $string) {
            concat($array, $string);
        }       
    }
    else {
      foreach($current_strings as $string) {
          echo $string . PHP_EOL;
      }   
    }
}

concat(array(array('dog', 'cat'), array('food', 'tooth'), array('car', 'bike')));

Which gives:

dog food car
dog food bike
dog tooth car
dog tooth bike
cat food car
cat food bike
cat tooth car
cat tooth bike

With this approach it is also easy to get the "sub-concatinations". Just insert echo $string . PHP_EOL; before concat($array, $string); and the output is:

 dog
 dog food
 dog food car
 dog food bike
 dog tooth
 dog tooth car
 dog tooth bike
 cat
 cat food
 cat food car
 cat food bike
 cat tooth
 cat tooth car
 cat tooth bike
Felix Kling
Felix - This works great on small arrays. I just tried it on 5 arrays of length 100, and got this: `Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 11 bytes) in /Users/qwerty/- on line 9` - I don't know that I'll have that many words, so your solution will probably work fine for what I'm doing. But definitely has some lag on larger arrays. Thanks for the idea!
hookedonwinter
@hookedonwinter: Have you tried the iterative approach too?
Felix Kling
@Felix not yet. Just saw it. Thanks!
hookedonwinter
That is very cool. Thanks Felix!
hookedonwinter
nit-picky: the second solution isn't "iterative". You just keep and reuse the intermediate results (like a cache). Nevertheless `foreach($current_strings as $string) { concat($array, $string);` is still recursion.
VolkerK
@VolkerK: Ok, I wasn't sure about that, thank you. But is there a special term for a recursion that don't uses the return values in the caller function?
Felix Kling
If there is then I don't know it ...and I've used up my "almost but not quite right" explanations for today/this week in my answer and the last comment ;-) Better ask someone who _really_ knows.
VolkerK
+2  A: 

I haven't tested this on huge word lists, but it's pretty fast on moderately sized lists and doesn't use recursion, which I think (please correct me if I'm wrong) is probably causing the memory limit problems:

$lines = array('');

foreach ($arrays as $array) {

  $old_lines = $lines;
  $lines = array();

  foreach ($array as $word) {

    foreach ($old_lines as $line) {

      $lines[] = trim($line .' '. $word);

    } // foreach

  } // foreach

} // foreach
Scott Reynen
I guess the memory limit was caused by the big result array what would be the same in your approach. But printing the line should be no problem. I mean 100^5 elements in array is a lot ;)
Felix Kling
Works awesome on smaller arrays, not so hot on the big ones. I like it though!
hookedonwinter
+1  A: 

My take

class Combinator
{
     protected $words;
     protected $combinator;

     public function __construct($words, $combinator = null)
     {
         $this->words = $words;
         $this->combinator = $combinator;
     }

     public function run($combo = '')
     {
         foreach($this->words as $word) {
             if($this->combinator !== null) {
                 $this->combinator->run("$combo $word"); 
             } else {
                 echo "$combo $word", PHP_EOL;
             }
         }
     }
}

$c = new Combinator(array('dog', 'cat'), 
                    new Combinator(array('food', 'tooth'),
                                   new Combinator(array('car', 'bike'))));

$c->run();
Gordon
+4  A: 

You can enumerate the elements of the result set, i.e. for each integer between 0....(number of elements)-1 you can tell which element to return (i.e. there is a natural order). For the given example:

0 => array1[0], array2[0], array3[0]
1 => array1[0], array2[0], array3[1]
2 => array1[0], array2[1], array3[0]
7 => array1[1], array2[1], array3[1]

All you need is a (integer) index n and a function that "translates" the index to the nth element of the (natural ordered) set. Since you only need an integer to store the current state the memory consumption doesn't "explode" when you have many/large arrays. As chris said in his comment, you trade speed (when using smaller sets) for low memory consumption. (Though I think -the way php is implemented- this is also a reasonable fast solution.)

$array1 = array('dog', 'cat');
$array2 = array('food', 'tooth');
$array3 = array('car', 'bike');

function foo( $key /* , ... */ ) {
  $params = func_get_args();
  $rv = array();

  $key = array_shift($params);
  $i=count($params);

  while( 0 < $i-- ) {
    array_unshift($rv, $params[$i][ $key % count($params[$i]) ]);
    $key = (int)($key / count($params[$i]));
  }
  return $rv;
}

for($i=0; $i<8; $i++) {
  $a = foo($i, $array1, $array2, $array3);
  echo join(', ', $a), "\n";
}

You can use this to implement e.g. an Iterator, a SeekableIterator or maybe even an ArrayAccess (and thereby inverting the control compared to the recursive solutions, almost like a yield in python or ruby)

<?php
$array1 = array('dog', 'cat', 'mouse', 'bird');
$array2 = array('food', 'tooth', 'brush', 'paste');
$array3 = array('car', 'bike', 'plane', 'shuttlecraft');
$f = new Foo($array1, $array2, $array3);
foreach($f as $e) {
  echo join(', ', $e), "\n";
}

class Foo implements Iterator {
  protected $data = null;
  protected $limit = null;
  protected $current = null;

  public function __construct(/* ... */ ) {  
    $params = func_get_args();
    // add parameter arrays in reverse order so we can use foreach() in current()
    // could use array_reverse(), but you might want to check is_array() for each element.
    $this->data = array();
    foreach($params as $p) {
      // <-- add: test is_array() for each $p  -->
      array_unshift($this->data, $p);
    }
    $this->current = 0;
    // there are |arr1|*|arr2|...*|arrN| elements in the result set
    $this->limit = array_product(array_map('count', $params));
  }

  public  function current() {
    /* this works like a baseX->baseY converter (e.g. dechex() )
       the only difference is that each "position" has its own number of elements/"digits"
    */
    // <-- add: test this->valid() -->
    $rv = array();
    $key = $this->current;
    foreach( $this->data as $e) {
      array_unshift( $rv, $e[$key % count($e)] );
      $key = (int)($key/count($e));
    }
    return $rv;
  }

  public function key() { return $this->current;  }
  public function next() { ++$this->current; }
  public function rewind () { $this->current = 0; }
  public function valid () { return $this->current < $this->limit; }
}

prints

dog, food, car
dog, food, bike
dog, food, plane
dog, food, shuttlecraft
dog, tooth, car
dog, tooth, bike
[...]
bird, paste, bike
bird, paste, plane
bird, paste, shuttlecraft

( the sequence seems to be ok ;-) )

VolkerK