views:

100

answers:

2

I'm looking for an example of how, in Ruby, a C like language, or pseudo code, to create the Cartesian product of a variable number of arrays of integers, each of differing length, and step through the results in a particular order:

So given, [1,2,3],[1,2,3],[1,2,3]:

[1, 1, 1]
[2, 1, 1]
[1, 2, 1]
[1, 1, 2]
[2, 2, 1]
[1, 2, 2]
[2, 1, 2]
[2, 2, 2]
[3, 1, 1]
[1, 3, 1]
etc.

Instead of the typical result I've seen (including the example I give below):

[1, 1, 1]
[2, 1, 1]
[3, 1, 1]
[1, 2, 1]
[2, 2, 1]
[3, 2, 1]
[1, 3, 1]
[2, 3, 1]
etc.

The problem with this example is that the third position isn't explored at all until all combinations of of the first two are tried. In the code that uses this, that means even though the right answer is generally (the much larger equivalent of) 1,1,2 it will examine a few million possibilities instead of just a few thousand before finding it.

I'm dealing with result sets of one million to hundreds of millions, so generating them and then sorting isn't doable here and would defeat the reason for ordering them in the first example, which is to find the correct answer sooner and so break out of the cartesian product generation earlier.

Just in case it helps clarify any of the above, here's how I do this now (this has correct results and right performance, but not the order I want, i.e., it creates results as in the second listing above):

def cartesian(a_of_a)
  a_of_a_len = a_of_a.size
  result = Array.new(a_of_a_len)
  j, k, a2, a2_len = nil, nil, nil, nil
  i = 0
  while 1 do
    j, k = i, 0
    while k < a_of_a_len
      a2 = a_of_a[k]
      a2_len = a2.size
      result[k] = a2[j % a2_len]
      j /= a2_len
      k += 1
    end

    return if j > 0
    yield result

    i += 1
  end

end

UPDATE: I didn't make it very clear that I'm after a solution where all the combinations of 1,2 are examined before 3 is added in, then all 3 and 1, then all 3, 2 and 1, then all 3,2. In other words, explore all earlier combinations "horizontally" before "vertically." The precise order in which those possibilities are explored, i.e., 1,1,2 or 2,1,1, doesn't matter, just that all 2 and 1 are explored before mixing in 3 and so on.

+1  A: 

It's not clear where the element [1, 1, 3] goes in your desired output. If my guess is right, the following works (although it could probably be optimized)

# yields the possible cartesian products of [first, *rest], where the total
# of the indices that are "distributed" is exactly +nb+.
def distribute(nb, first, *rest)
  if rest.empty?                    # single array remaining?
    yield first.fetch(nb) {return}  # yield the right element (if there is one)
  else
    first.each_with_index do |obj, i|
      break if i > nb
      distribute(nb - i, *rest) do |comb|
        yield [obj, *comb]
      end
    end
  end
end

def strange_cartesian(*arrays, &block)
  return to_enum __method__, *arrays unless block_given?
  max = arrays.map(&:length).inject(:+)
  max.times do |nb|
    distribute(nb, *arrays, &block)
  end
end

p strange_cartesian([1, 2, 3], [1, 2, 3], [1, 2, 3]).to_a
#  => [[1, 1, 1], [1, 1, 2], [1, 2, 1], [2, 1, 1], [1, 1, 3], [1, 2, 2], [1, 3, 1], [2, 1, 2], [2, 2, 1], [3, 1, 1], [1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 2, 2], [2, 3, 1], [3, 1, 2], [3, 2, 1], [1, 3, 3], [2, 2, 3], [2, 3, 2], [3, 1, 3], [3, 2, 2], [3, 3, 1], [2, 3, 3], [3, 2, 3], [3, 3, 2], [3, 3, 3]]

Note: If you are still running Ruby 1.8.6, upgrade to at least 1.8.7 (or require 'backports')

Marc-André Lafortune
Nice usage of `__method__`. I had totally forgotten about that. I actually remember wanting a way to refer to the method from inside the method. Gotta dig out that code and refactor it right now ...
Jörg W Mittag
Nice, and it gives an order much closer to what I'm after than mine. I tried to clarify my question a bit to address the remaining difference. But it's probably good enough if I can rejigger to not need gc, which I'll try now.
Yuri
+1  A: 

After the precision in the question, here's a revised version. I'm keeping the previous answer since it can be useful too and uses a less complex order.

# yields the possible cartesian products of [first, *rest], where the total
# of the indices that are "distributed" is exactly +nb+ and each index doesn't
# go beyong +depth+, but at least one of them is exactly +depth+
def distribute(nb, depth, reached, first, *rest)
  from  = [nb - rest.size * depth, 0].max
  to    = [first.size-1, depth, nb].min
  from.upto(to) do |i|
    obj = first[i]
    reached ||= i == depth
    if rest.empty?
      yield [obj] if reached
    else
      distribute(nb - i, depth, reached, *rest) do |comb|
        yield [obj, *comb]
      end
    end
  end
end

def depth_first_cartesian(*arrays)
  return to_enum __method__, *arrays unless block_given?
  lengths = arrays.map(&:length)
  total = lengths.inject(:+)
  lengths.max.times do |depth|
    depth.upto(arrays.size * depth) do |nb|
      distribute(nb, depth, false, *arrays) {|c| yield c}
    end
  end
end

p depth_first_cartesian([1, 2, 3], [1, 2, 3, 4], [1, 2, 3]).to_a
# => [[1, 1, 1], [1, 1, 2], [1, 2, 1], [2, 1, 1], [1, 2, 2], [2, 1, 2], [2, 2, 1], [2, 2, 2],
#     [1, 1, 3], [1, 3, 1], [3, 1, 1], [1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2],
#     [3, 2, 1], [1, 3, 3], [2, 2, 3], [2, 3, 2], [3, 1, 3], [3, 2, 2], [3, 3, 1], [2, 3, 3],
#     [3, 2, 3], [3, 3, 2], [3, 3, 3], [1, 4, 1], [1, 4, 2], [2, 4, 1], [1, 4, 3], [2, 4, 2],
#     [3, 4, 1], [2, 4, 3], [3, 4, 2], [3, 4, 3]]
Marc-André Lafortune
Thanks Marc-André, this does exactly what I need. I have to figure out a way to get the temporary array creation out (GC kills when the results get into the tens of millions,) and if you've a thought on that, I'd love to know. But, that's really outside the scope of the question, so thanks again for both answers.
Yuri
@Yuri: Shouldn't be too difficult, depending on your Ruby skills. But basically, modify your "answer" array and yield it instead of yielding new arrays like I do. Also, it probably would be best to avoid passing the array of arrays around and spliting it like I did (first, *rest) because that also creates intermediary arrays.Are we allowed to offer services on SO? ;-)
Marc-André Lafortune