views:

132

answers:

6

Suppose I have a list of lists of elements which are all the same (i'll use ints in this example)

[range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]

What would be a nice and/or efficient way to take the intersection of these lists (so you would get every element that is in each of the lists)? For the example that would be:

[0, 12, 24, 36, 48, 60, 72, 84, 96]
+4  A: 

I think the built-in set module should do the trick.

>>> elements = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
>>> sets = map(set, elements)
>>> result = list(reduce(lambda x, y: x & y, sets))
>>> print result
[0, 96, 36, 72, 12, 48, 84, 24, 60]
Dan Loewenherz
You beat me to the punch. I'll leave my answer up as it applies `reduce` slightly differently, but I'm glad to see that other people think functionally too. :-)
Samir Talwar
Too many schools skip the (imho mandatory) introductory functional programming course by skipping straight to Java. Come on guys, SCIP is about the best introductory CS book ever...
Paul McMillan
Note that `set` is a type, not a module. (The set type used to be in a module called `sets`, but it is long deprecated.)
Mike Graham
although it's very elegant and works just fine, this seems to be twice as slow as the solutions not using reduce. Anyone have any ideas why?
thepandaatemyface
It could be due to having to build more intermediary sets.
Mike Graham
+3  A: 

Convert them to sets and use the set.intersection method, reducing over the list of sets:

xs = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
reduce(set.intersection, [set(x) for x in xs])

reduce is a functional programming device that iterates through any iterable and applies the function provided to the first two elements, then to the result and the next, and then the result of that and the next, and so on.

Samir Talwar
`set.intersection` takes an arbitrary number of iterables as arguments (in recent Pythons). If I'm not mistaken, this can be implemented with better algorithmic complexity than the `reduce` method provides.
Mike Graham
@Mike: That's brilliant. I had no idea.
Samir Talwar
A: 

You can treat them as sets and use set.intersection():

lists = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
sets = [set(l) for l in lists]

isect = reduce(lambda x,y: x.intersection(y), sets)
Andrew Jaffe
This would raise `AttributeError`?
Mike Graham
Oops, `intersect` -> `intersection` (fixed).
Andrew Jaffe
A: 
l = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
l = [set(i) for i in l]
intersect = l[0].intersection(l[1])
for i in l[2:]:
    intersect = intersect.intersection(i)
inspectorG4dget
This would raise `NameError` ?
Mike Graham
I don't think so, @MikeGraham. Perhaps you are referring to the code that was here before I edited. I ran the old code and got an error, but this code has been tested and woks fine
inspectorG4dget
@inspectG4dget, I was referring to the code before it was edited (my comment appears at least as old as the edit?) This code will not exibit that error, though I must confess I find your design a bit odd.
Mike Graham
@MikeGraham: I saw the timing of the edit and the comment, which is why I suggested that your comment may have been posted before the edit. But I am curious as to why and how you would change the design of this.
inspectorG4dget
@inspectorG4dget, I do not understand why you specialcase the first two items. I would do `l = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]` `l = [set(i) for i in l]` `intersect = l[0]` `for i in l[1:]: intersect = intersect.intersection(i)` or `l = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]` `intersect = set(l[0])` `for i in l[1:]: intersect = intersect.intersection(i)` if I was following this basic pattern.
Mike Graham
@MikeGraham: Very true. I had thought of doing this, but I wanted to be more transparent with my code - especially since I did not document it at all. I don't know how proficient @thepandaatemyface is and therefore wanted to keep this as simple as possible.
inspectorG4dget
I don't see how my first example is any more complex than your code, but I suppose that's what makes the world go 'round. `:)`
Mike Graham
+1  A: 

I'm going to answer my own question:

lists =  [range(100)[::4],range(100)[::3],range(100)[::2],range(100)[::1]]

out = set(lists[0])
for l in lists[1:]:
    out = set(l).intersection(out)

print out

or

print list(out)
thepandaatemyface
Note that in recent Pythons, `set.intersection` takes an arbitrary number of iterables as its argument, so `for l in lists[1:]:` `out = set(l).intersection(out)` could be written `out.intersection(*lists[1:])`.
Mike Graham
+4  A: 

Use sets, which have an intersection method.

>>> s = set()
>>> s.add(4)
>>> s.add(5)
>>> s
set([4, 5])
>>> t = set([2, 4, 9])
>>> s.intersection(t)
set([4])

For your example, something like

>>> data = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
>>> sets = map(set, data)
>>> print set.intersection(*sets)
set([0, 96, 36, 72, 12, 48, 84, 24, 60])
Mike Graham
I'll put this as the best answer, because it's a little faster than my own (which is in turn twice as fast as the ones using reduce) and because the neat thing with the multiple sets at once. Thanks!
thepandaatemyface
Alternatively, `set.intersection(set(x) for x in data)`
David Zaslavsky
@thepandaatemyface, I'm always glad to hear my code performs well, but always a bit suspicious as well. I'm sure it depends on the input a lot and you don't have time to have ran it on truly huge input if size of input was the issue. If I was trying to optimize for speed in an inner loop on large data, I would consider trying `set(datas[0]).intersection(*datas[1:])` out and timing it, which has a nice ring of performance to me.
Mike Graham
are you sure that works, david?
thepandaatemyface
@Mike Graham, what I meant to say is: of all the elegant solutions posted here, yours was the fastest. I quickly tested it with a [[randint(0, 100000) for i in range(1000)] for i in range(100)] as my data. It's not very scientific, but it seems to keep giving yours as the fastest.
thepandaatemyface
@David, you're missing a `*(...)` to apply the generator's items as args. Other than that, that's certainly a fine approach. The main reason I didn't use it was to emphasize that if you're doing operations like intersection, you should probably *already* have sets.
Mike Graham
@thepandaatemyface, Awesome. =)
Mike Graham
@Mike: oops, forgot that.
David Zaslavsky