views:

266

answers:

5

Suppose I have a set of column definitions:

Col1: value11 value12 value13
Col2: value21 value22 
Col3: value31 value32 value33
...

Given some subset of columns -- 2 or more -- I want to find all possible values for those columns. Suppose I choose columns 1 and 2, above:

(value11 value21)
(value11 value22)
(value12 value21)
(value12 value22)
(value13 value21)
(value13 value22)

If I'd chosen 2:3:

(value21 value31)
(value21 value32)
(value21 value33)
(value22 value31)
(value22 value32)
(value22 value33)

If I'd chosen all three:

(value11 value21 value31)
(value11 value21 value32)
...

I'm implementing this in python, and I'd like a fast algorithm to do this. My input is a list of tuples: (columnName, columnValueList)

Any suggestions?

A: 

Looks like itertools.combination could help you.

>>> from itertools import combinations
>>> list(combinations(xrange(5), 2))
[(0, 1), (0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
>>> list(combinations(xrange(5), 3))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2, 3), (1, 2, 4),   (1, 3, 4), (2, 3, 4)]
fabrizioM
itertools.combinations will give subsequences from a single iterable - the asker was looking for the possible pairs from two iterables
mdirolf
+7  A: 

The best way to get this is going to be using itertools.product(). For example:

import itertools

group1 = ['a', 'b']
group2 = ['c', 'd']

print list(itertools.product(group1, group2))

#==> [('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]

This function accepts multiple arguments (i.e. multiple columns).

For more help on iterools.product() see this.

Jeremy Cantrell
+1  A: 

In addition to using itertools.product() as jeremy suggests, you might want to consider converting your list of tuples to a dict to make lookups for columnName fast:

dict(tupleList)

mdirolf
+1  A: 

more general solutions that Jeremy's would be:

import itertools
full = [[value11, value12, value13],
        [value21, value22],
        [value31, value32, value33]]
ids = 0, 2

or if it is a dict (as it should be):

full = {'col1': [value11, value12, value13],
        'col2': [value21, value22],
        'col3': [value31, value32, value33]}
ids = 'col1', 'col3'
selected = (full[i] for i in ids)
list(itertools.product(*selected))
SilentGhost
+1  A: 

I'll bet itertools-based solutions are going to be faster, but if they need to be avoided (e.g. stuck on Python 2.5 without itertools.product, etc), it can of course be entirely coded in "base Python" when one must.

Trying to "pull out all the stops" for speed, maybe something like:

def odo(*names_and_valuelists):
    aux = [[vl, 0] for n, vl in names_and_valuelists]
    if any(len(vl)==0 for vl, _ in aux):
        return
    while True:
        yield tuple(vl[i] for vl, i in aux)
        for vlandi in reversed(aux):
          if vlandi[1] == len(vlandi[0])-1:
            vlandi[1] = 0
          else:
            vlandi[1] += 1
            break
        else:
          return

though minor tweaks might still accelerate it (needs thorough profiling with realistic sample data!).

Here's your example of use:

def main():
    data = [
        ('Col1', 'value11 value12 value13'.split()),
        ('Col2', 'value21 value22'.split()),
        ('Col3', 'value31 value32 value33'.split()),
    ]
    for tup in odo(data[0], data[1]): print tup
    print
    for tup in odo(data[1], data[2]): print tup
    print
    for i, tup in enumerate(odo(*data)):
        print tup
        if i>5: break

if __name__ == '__main__':
    main()

which emits the results:

('value11', 'value21')
('value11', 'value22')
('value12', 'value21')
('value12', 'value22')
('value13', 'value21')
('value13', 'value22')

('value21', 'value31')
('value21', 'value32')
('value21', 'value33')
('value22', 'value31')
('value22', 'value32')
('value22', 'value33')

('value11', 'value21', 'value31')
('value11', 'value21', 'value32')
('value11', 'value21', 'value33')
('value11', 'value22', 'value31')
('value11', 'value22', 'value32')
('value11', 'value22', 'value33')
('value12', 'value21', 'value31')
Alex Martelli