tags:

views:

265

answers:

8

I have three sets:

s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]   # true, 16 and 14
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])]    # false

I want a function that will return True if every set in the list intersects with at least one other set in the list. Is there a built-in for this or a simple list comprehension?

+2  A: 

It's a little verbose but I think it's a pretty efficient solution. It takes advantage of the fact that when two sets intersect, we can mark them both as connected. It does this by keeping a list of flags as long as the list of sets. when set i and set j intersect, it sets the flag for both of them. It then loops over the list of sets and only tries to find a intersection for sets that haven't already been intersected. After reading the comments, I think this is what @Victor was talking about.

s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]   # true, 16 and 14
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])]    # false


def connected(sets):
    L = len(sets)

    if not L: return True
    if L == 1: return False

    passed = [False] * L
    i = 0
    while True:
        while passed[i]: 
            i += 1
            if i == L: 
                return True

        for j, s in enumerate(sets):
            if j == i: continue
            if sets[i] & s: 
                passed[i] = passed[j] = True
                break
        else:
            return False


print connected(s0)
print connected(s1)

I decided that an empty list of sets is connected (If you produce an element of the list, I can produce an element that it intersects ;). A list with only one element is dis-connected trivially. It's one line to change in either case if you disagree.

aaronasterling
lots of good answers, I'm using 2.5.2 right now so I ended up going with this solution
spoon16
+11  A: 
all(any(a & b for a in s if a is not b) for b in s)
jchl
Elegant, but I would prefer 2 straight loops and a counter to avoid compare each two items 2 times, it would speed things up at least by a factor of two
Victor Ionescu
@jchl: This is really cool and adds to my python knowledge
pyfunc
The `bool()` call is redundant.
KennyTM
@Victor Ionescu, why not throw up an answer with your preferred method. It'd be worth at least ten rep :)
spoon16
@spoon16 : I tried to remove unnecessary comparisons. Of course, with all the additional code, I am not sure if this yields an optimized version over a quite elegant solution by jchl. I will probably try to time both over a large set sequence and see what it results in.
pyfunc
@KennyTM: Thanks, I didn't realize `any()` performed automatic boolean conversion. I removed the `bool()` call.
jchl
@spoon16: My solution won't work, I will need to rework it. At the moment, I will delete it.
pyfunc
I think this might have some issues in cases where there are two of the same set. e.g. `set(()) is not set(())`
intuited
@intuited: If the list contains two identical sets, then this solution will (correctly, I believe) allow those two sets to intersect each other. However, you do make me realize that if the list contained the same set object multiple times, then I will not intersect it with itself. I'm not sure if that's desirable or not.
jchl
@jchl: ahh, you're right, I had it backwards.
intuited
A: 

This strategy isn't likely to be as efficient as @Victor's suggestion, but might be more efficient than jchl's answer due to increased use of set arithmetic (union).

s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])]

def freeze(list_of_sets):
    """Transform a list of sets into a frozenset of frozensets."""
    return frozenset(frozenset(set_) for set_ in list_of_sets)

def all_sets_have_relatives(set_of_sets):
    """Check if all sets have another set that they intersect with.

    >>> all_sets_have_relatives(s0)   # true, 16 and 14
    True
    >>> all_sets_have_relatives(s1)   # false
    False
    """
    set_of_sets = freeze(set_of_sets)
    def has_relative(set_):
        return set_ & frozenset.union(*(set_of_sets - set((set_,))))
    return all(has_relative(set) for set in set_of_sets)
intuited
Why all the frozensets?
jchl
@jchl: sets aren't hashable, so they can't be elements in a set. I'm not sure if there's much point in making the top-level set a frozen one, but it seemed like it might be faster than a regular set, and it suits the logic.
intuited
@jchl: Ahh, I see what you're talking about now. Looks like I had some sort of mental indigestion issue when refactoring and then posting that code. "The doctest passed, though!" Anyway, it's fixed now.
intuited
my profiling indicates that they are slightly faster. only barely though
aaronasterling
Bad news. I just profiled this one and it takes around 2.5 seconds on the same type of large random input that @jchl's takes around 4.7 milli-seconds on. The bad news for me is that with all the effort I put into it, I only beat him by .2 milliseconds.
aaronasterling
"O! I am slain!" Interesting.. was that with large sets, or a lot of sets, or both?
intuited
A: 

This may give better performance depending on the distribution of the sets.

def all_intersect(s):
   count = 0
   for x, a in enumerate(s):
      for y, b in enumerate(s):
         if a & b and x!=y:
            count += 1
            break
   return count == len(s)
dheerosaur
John Machin
Hello, hello, ... your code is BROKEN. `count` becomes the number of non-empty pair-wise set intersections, less 1. This is nothing like what the OP wants. By ACCIDENT it gives the correct answers for the OP's two test cases. Try this one: s2 = [set([16, 9, 2, 10]), set([16, 22, 14, 15]), set([14, 2])] ... it should return True but your count is 6-1 == 5 and so your code returns False.
John Machin
dheerosaur
aaronasterling
+2  A: 

Here's a more efficient (if much more complicated) solution, that performs a linear number of intersections and a number of unions of order O( n*log(n) ), where n is the length of s:

def f(s):
    import math
    j = int(math.log(len(s) - 1, 2)) + 1
    unions = [set()] * (j + 1)
    for i, a in enumerate(s):
        unions[:j] = [set.union(set(), *s[i+2**k:i+2**(k+1)]) for k in range(j)]
        if not (a & set.union(*unions)):
            return False
        j = int(math.log(i ^ (i + 1), 2))
        unions[j] = set.union(a, *unions[:j])
    return True

Note that this solution only works on Python >= 2.6.

jchl
This is significantly slower performing that your previous answer. whereas your previous answer averages 4.7 msecs, (on large random input) this one takes 250. Unions are what's killing you (with this implementation) and intuited. Theoretically appealing but expensive.
aaronasterling
Interesting. I think it's the large constant factor due to all the Python code (compared to the previous solution that can be evaluated almost all in C). Also, my first solution will work well if the `any()` call can shortcut often, whereas this solution works well in all cases. This solution performs better (9 vs 24 seconds) than the first on this input: `[set((i,)) for i in range(N)] + [set(range(N))]` with `N = 10000`, and I suspect would perform better still with even larger `N`.
jchl
I just realized that this solution is far more complicated than necessary, and still not optimal. My third solution in much better for both reasons.
jchl
+3  A: 

Here's a very simple solution that's very efficient for large inputs:

def g(s):
    import collections
    count = collections.defaultdict(int)
    for a in s:
        for x in a:
            count[x] += 1
    return all(any(count[x] > 1 for x in a) for a in s)
jchl
you can just use `count.setdefault(x, 0) += 1` instead of the `if/else` statement.
aaronasterling
No you can't. `count.setdefault(x, 0)` isn't an lvalue.
jchl
+1 because it circumvents the “obvious” set intersection stuff and simplifies the problem. Unfortunately, I don't think you will get enough upvotes. BTW, either `collections.Counter`, or `count= collections.defaultdict(int)` and `count[x]+=1`
ΤΖΩΤΖΙΟΥ
Thanks for the hit about `defaultdict`. I've simplified my answer accordingly.
jchl
+1  A: 

As usual I'd like to give the inevitable itertools solution ;-)

from itertools import combinations, groupby
from operator import itemgetter


def any_intersects( sets ):
    # we are doing stuff with combinations of sets
    combined = combinations(sets,2) 
    # group these combinations by their first set
    grouped = (g for k,g in groupby( combined, key=itemgetter(0)))
    # are any intersections in each group
    intersected = (any((a&b) for a,b in group) for group in grouped)
    return all( intersected )


s0 = [set([16,9,2,10]), set([16,14,22,15]), set([14,7])]
s1 = [set([16,9,2,10]), set([16,14,22,15]), set([7,8])] 
print any_intersects( s0 ) # True
print any_intersects( s1 ) # False

This is really lazy and will only do the intersections that are required. It can also be a very confusing and unreadable oneliner ;-)

THC4k
combinated? How about "combined"? ;-)
martineau
Heh, some online dictionaries list "combinated", but "combined" does sound much better ;-)
THC4k
On further reflection, "combos" would probably be a more descriptive -- and therefore even better -- variable name.
martineau
+1  A: 

To answer your question, no, there isn't a built-in or simple list comprehension that does what you want. Here's another itertools based solution that is very efficient -- surprisingly about twice as fast as @THC4k's itertools answer using groupby() in timing tests using your sample input. It could probably be optimized a bit further, but is very readable as presented. Like @AaronMcSmooth, I arbitrarily decided what to return when there are no or is only one set in the input list.

from itertools import combinations

def all_intersect(sets):
    N = len(sets)
    if not N: return True
    if N == 1: return False

    intersected = [False] * N
    for i,j in combinations(xrange(N), 2):
        if not intersected[i] or not intersected[j]:
            if sets[i] & sets[j]:
                intersected[i] = intersected[j] = True
    return all(intersected)
martineau