views:

298

answers:

7

I have N Lists I'd like to find unique combinations of. I've written it out on my whiteboard and it all seems to have a pattern, I just haven't found it yet. I feel I can express a brute-force method and that will certainly be something I pursue. Is there an alternative? Would a different data structure (binary tree?) make a job like this more appropriate?

Given:

#    1  2
a = [1, 2]
b = [a, b]

The result would be:

c = [1a, 1b, 2a, 2b] # (4 unique combinations)

Given:

v = [1, a]
w = [1, b]
x = [1, c]
y = [1, d]
z = [1, e]

The result would be:

r = [11111, 1bcde, 11cde, 111de, 1111e, a1111, ab111, abc11, abcd1, abcde, 1b1d1, 1bc1e, 11c11, 11c1e, ... ]
+2  A: 

Hi

I don't think the question asks for the powerset of the inputs, I think it asks for (part of) the Cartesian product of the input sets. I expect someone will correct me if I'm wrong.

And, as for an algorithm, well now that you know what it is you are looking for, Google will be your friend.

In your second example, you exclude entries such as 1b1de from your result set. Is this deliberate ? If it is deliberate, what is the rule by which the output is constructed ?

Regards

Mark

High Performance Mark
It was not deliberate. I just hadn't looked at it long enough. Thanks for you and Mark Byers for catching it.
Nick Stinemates
A: 

You're looking for the Cartesian product. In Python, if you want tuples:

c = [(x, y) for x in a for y in b]
r = [(vv, ww, xx, yy, zz)
     for vv in v  for ww in w  for xx in x  for yy in y  for zz in z]
Jason Orendorff
This works, but only if you know how many lists you will get in advance. It's not the nicest way to solve the problem IMHO, especially if the number of lists is large.
Mark Byers
itertools.product is definitely better in the general case.
Jason Orendorff
+8  A: 

Perhaps you are looking for itertools.product:

#!/usr/bin/env python
import itertools
a=[1,2]
b=['a','b']
c=[str(s)+str(t) for s,t in itertools.product(a,b)]
print(c)
['1a', '1b', '2a', '2b']

v=[1,'a']
w=[1,'b']
x=[1,'c']
y=[1,'d']
z=[1,'e']

r=[''.join([str(elt) for elt in p]) for p in itertools.product(v,w,x,y,z)]
print(r)
# ['11111', '1111e', '111d1', '111de', '11c11', '11c1e', '11cd1', '11cde', '1b111', '1b11e', '1b1d1', '1b1de', '1bc11', '1bc1e', '1bcd1', '1bcde', 'a1111', 'a111e', 'a11d1', 'a11de', 'a1c11', 'a1c1e', 'a1cd1', 'a1cde', 'ab111', 'ab11e', 'ab1d1', 'ab1de', 'abc11', 'abc1e', 'abcd1', 'abcde']

Note that product yields 2**5 elements. Is this what you want?

itertools.product is in Python 2.6. For previous versions, you can use this:

def product(*args, **kwds):
        '''
        Source: http://docs.python.org/library/itertools.html#itertools.product
        '''
        # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
        # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
        pools = map(tuple, args) * kwds.get('repeat', 1)
        result = [[]]
        for pool in pools:
            result = [x+[y] for x in result for y in pool]
        for prod in result:
            yield tuple(prod)

Edit: As jellybean points out, the original question asks for unique sets. The above code will not produce unique sets if a,b,v,w,x,y, or z contain repeated elements. If this is an issue for you, then you can convert each list to a set before sending it to itertools.product:

r=[''.join([str(elt) for elt in p]) for p in itertools.product(*(set(elt) for elt in (v,w,x,y,z)))]
unutbu
This is definitely what I want. Thank you!
Nick Stinemates
Where's the uniqueness?
jellybean
If the starting lists do not contain any duplicates, the combinations won't either.
Jason Orendorff
@jorendorff: That's obviously correct, but the original task started with "N lists", and not, for instance, "N sets".
jellybean
@jellybean: There's nothing stopping him from calling itertools on set([a]) instead of calling it on [a].
Brian
+1  A: 

I'm assuming that you want the Cartesian product - all possible lists created by choosing exactly one element from each list. You can implement it recursively, like this:

def cartesian_product(l):
    if l:
     for b in cartesian_product(l[1:]):
      for a in l[0]:
       yield [a] + b
    else:
     yield []  

l = [
 [ 'a', 'b' ],
 [ 'c', 'd', 'e' ],
 [ 'f', 'g' ],
]

for x in cartesian_product(l):
    print x

Update: ~unutbu's suggestion of itertools.product is better, but I'll leave this here anyway.

Mark Byers
I will be translating this to a different language so this is definitely helpful. Thank you.
Nick Stinemates
+1  A: 

Since you need a cartesian product, use that of itertools !

>>> import itertools
>>> v = [1, 'a']
>>> w = [1, 'b']
>>> x = [1, 'c']
>>> y = [1, 'd']
>>> z = [1, 'e']

>>> p = [''.join(str(x) for x in c) for c in itertools.product(v,w,x,y,z)]
>>> p
['11111', '1111e', '111d1', '111de', '11c11', '11c1e', '11cd1', '11cde', '1b111'
, '1b11e', '1b1d1', '1b1de', '1bc11', '1bc1e', '1bcd1', '1bcde', 'a1111', 'a111e
', 'a11d1', 'a11de', 'a1c11', 'a1c1e', 'a1cd1', 'a1cde', 'ab111', 'ab11e', 'ab1d
1', 'ab1de', 'abc11', 'abc1e', 'abcd1', 'abcde']
>>>
mjv
+1  A: 

Might that do the trick?

def getAllCombinations(listOfLists):
    if len(listOfLists) == 1:
        return [str(x) for x in listOfLists[0]]

    result = set()
    head, tail = listOfLists[0], listOfLists[1:]

    tailCombs = getAllCombinations(tail)
    for elem in head:
        for tc in tailCombs:
            result.add(str(elem) + tc)
    return result

v = [1, 'a']
w = [1, 'b']
x = [1, 'c']
y = [1, 'd']
z = [1, 'e']

>>> print getAllCombinations([v, w, x, y, z])
set(['111de', 'abc11', 'a1c1e', 'a111e', '11c11', 'ab11e', '1bc11', 'ab1d1', 'a1cd1', '1b1de', 'a11d1', '11111', '1b111', '11cd1', 'abcd1', '1bcde', 'ab111', '1bc1e', 'abc1e', '111d1', 'a1111', '11c1e', 'a1c11', '11cde', '1b11e', '1bcd1', 'abcde', 'a1cde', '1b1d1', 'a11de', 'ab1de', '1111e'])
jellybean
That's gorgeous. Very easy to read.
Nick Stinemates
+2  A: 

I think another answer is in order, to respond to this:

I've written it out on my whiteboard and it all seems to have a pattern, I just haven't found it yet.

There is a pattern.

Suppose you have just two lists to combine. You can find all the combinations by making a grid.

       black        blue
     +------------+------------+
coat | black coat | blue coat  |
     +------------+------------+
hat  | black hat  | blue hat   |
     +------------+------------+

As you can see, there are 2*2 combinations. If there were 30 colors and 14 kinds of clothing, you would have 30 * 14 = 420 combinations.

The pattern continues as you add more lists. Instead of a 2-dimensional rectangle, you get an 3-dimensional array of boxes, or ultimately an n-dimensional hyperrectangle. Regardless, the total number of combinations is always the product of the lengths of all the lists.

If you know how many lists you have, nested loops are a natural way to make all combinations.

for color in colors:
    for kind in kinds:
        print color, kind  # "black coat", "black hat", etc.

If the lists are in dictionary order to start with, and there are no duplicates, the output will also be in dictionary order.

Jason Orendorff
Color me impressed. Thanks for helping.
Nick Stinemates