views:

331

answers:

1

Hello,

in my work I use a lot of Venn diagrams, and so far I've been relying on the web-based "Venny". This offers the nice option to export the various intersections (i.e., the elements belonging only to that specific intersection). Also, it does diagrams up to 4 lists.

Problem is, doing this with large lists (4K+ elements) and more than 3 sets is a chore (copy, paste, save...). Thus, I have decided to focus on generating the lists myself and use it just to plot.

This lengthy introduction leads to the crux of the matter. Given 3 or 4 lists which partially contain identical elements, how can I process them in Python to obtain the various sets (unique, common to 4, common to just first and second, etc...) as shown on the Venn diagram (3 list graphical example, 4 list graphical example)? It doesn't look too hard for 3 lists but for 4 it gets somewhat complex.

+3  A: 

Assuming you have python 2.6 or better:

>>> from itertools import combinations
>>>
>>> data = dict(
...   list1 = set(list("alphabet")),
...   list2 = set(list("fiddlesticks")),
...   list3 = set(list("geography")),
...   list4 = set(list("bovinespongiformencephalopathy")),
... )
>>>
>>> variations = {}
>>> for i in range(len(data)):
...   for v in combinations(data.keys(),i+1):
...     vsets = [ data[x] for x in v ]
...     variations[tuple(sorted(v))] = reduce(lambda x,y: x.intersection(y), vsets)
...
>>> for k,v in sorted(variations.items(),key=lambda x: (len(x[0]),x[0])):
...   print "%r\n\t%r" % (k,v)
...
('list1',)
        set(['a', 'b', 'e', 'h', 'l', 'p', 't'])
('list2',)
        set(['c', 'e', 'd', 'f', 'i', 'k', 'l', 's', 't'])
('list3',)
        set(['a', 'e', 'g', 'h', 'o', 'p', 'r', 'y'])
('list4',)
        set(['a', 'c', 'b', 'e', 'g', 'f', 'i', 'h', 'm', 'l', 'o', 'n', 'p', 's', 'r', 't', 'v', 'y'])
('list1', 'list2')
        set(['e', 'l', 't'])
('list1', 'list3')
        set(['a', 'h', 'e', 'p'])
('list1', 'list4')
        set(['a', 'b', 'e', 'h', 'l', 'p', 't'])
('list2', 'list3')
        set(['e'])
('list2', 'list4')
        set(['c', 'e', 'f', 'i', 'l', 's', 't'])
('list3', 'list4')
        set(['a', 'e', 'g', 'h', 'o', 'p', 'r', 'y'])
('list1', 'list2', 'list3')
        set(['e'])
('list1', 'list2', 'list4')
        set(['e', 'l', 't'])
('list1', 'list3', 'list4')
        set(['a', 'h', 'e', 'p'])
('list2', 'list3', 'list4')
        set(['e'])
('list1', 'list2', 'list3', 'list4')
        set(['e'])
MattH
Thank you, this does the trick.
Einar