tags:

views:

184

answers:

7

Someone asked How to do Python’s zip in C#?...

...which leads me to ask, what good is zip? In what scenarios do I need this? Is it really so foundational that I need this in the base class library?

+4  A: 

It allows you to process sequences in parallel instead of sequentially or nested. There's... so many uses for it that they currently escape me.

Ignacio Vazquez-Abrams
I still hope that you remember some of them ...
tanascius
@tanascius: The nice thing about Python is that I don't *have* to remember them; making them up as I go along is a perfectly viable strategy.
Ignacio Vazquez-Abrams
+3  A: 

Here's a common use case for zip:

x = [1,2,3,4,5]
y = [6,7,8,9,0]

for x,y in zip(x,y):
    print x+y
sharth
+2  A: 

A use case:

>>> fields = ["id", "name", "location"]
>>> values = ["13", "bill", "redmond"]
>>> dict(zip(fields, values))
{'location': 'redmond', 'id': '13', 'name': 'bill'}

Try doing this without zip...

ChristopheD
dict([(fields[i], values[i]) for i in range(min(len(fields), len(values)))])
tixxit
@tixxit: I did not mean to say that it was impossible without zip, I just meant less readable/elegant and definitely more verbose ;-)
ChristopheD
And, of course, if we generalize the above list comprehension, we just get the zip function again:def myzip(*args): return [tuple(args[j][i] for j in range(len(args))) for i in range(min(len(args[j]) for j in range(len(args))))]
tixxit
@ChristopheD: I know. I'm just joking around :)
tixxit
+5  A: 

zip is useful if you'd like to iterate over multiple iterables simultaneously, which is a reasonably common scenario in Python.

One real-world scenario where zip has come in handy for me is if you have an M by N array, and you want to look at columns instead of rows. For example:

>>> five_by_two = ((0, 1), (1, 2), (2, 3), (3, 4), (4, 5))
>>> two_by_five = tuple(zip(*five_by_two))
>>> two_by_five
((0, 1, 2, 3, 4), (1, 2, 3, 4, 5))
Jeffrey Harris
cool, didn't know you could zip more than 2 groups. Is that new in Python 3 or does that work in 2.x?
Davy8
Yup, I've used this often. @Davy8: Yes, it definitely works in 2.x. I forget when the `*` operator was added, but you probably could have used `apply()` to get the same effect even in Python 1.x.
Daniel Pryden
@Davy8 it works in 2.x. When zip is used with * it is commonly referred to as unzip.
Justin Peel
+1  A: 

Here's a case where I used zip() to useful effect, in a Python class for comparing version numbers:

class Version(object):

    # ... snip ...

    def get_tuple(self):
        return (self.major, self.minor, self.revision)

    def compare(self, other):
        def comp(a, b):
            if a == '*' or b == '*':
                return 0
            elif a == b:
                return 0
            elif a < b:
                return -1
            else:
                return 1
        return tuple(comp(a, b) for a, b in zip(self.get_tuple(), Version(other).get_tuple()))

    def is_compatible(self, other):
        tup = self.compare(other)
        return (tup[0] == 0 and tup[1] == 0)

    def __eq__(self, other):
        return all(x == 0 for x in self.compare(other))

    def __ne__(self, other):
        return any(x != 0 for x in self.compare(other))

    def __lt__(self, other):
        for x in self.compare(other):
            if x < 0:
                return True
            elif x > 0:
                return False
        return False

    def __gt__(self, other):
        for x in self.compare(other):
            if x > 0:
                return True
            elif x < 0:
                return False
        return False

I think zip(), coupled with all() and any(), makes the comparison operator implementations particularly clear and elegant. Sure, it could have been done without zip(), but then the same could be said about practically any language feature.

Daniel Pryden
@Daniel Pryden: suggestion: `if x >0: return True | elif x > 0: return False | return False` could be simply `return x > 0`
ChristopheD
@ChristopheD: No, because it's in a `for` loop over the dotted parts. If the current dotted part is greater, then we know the whole thing is greater; if the current dotted part is less, then we know the whole thing is less; but if the current dotted part is equal then we need to look at the next dotted part. That said, there is a formatting issue with that last line's indentation... give me a sec and I'll fix that.
Daniel Pryden
@Daniel Pryden: Ok, that makes sense now ;-)
ChristopheD
+3  A: 

It's handy in different places. My favorite, from http://norvig.com/python-iaq.html, is transposing a matrix:

>>> x = [ [1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15]]
>>> zip(*x)
[(1, 6, 11), (2, 7, 12), (3, 8, 13), (4, 9, 14), (5, 10, 15)]
Charles Merriam
http://stackoverflow.com/questions/2429692/what-is-the-purpose-of-a-zip-function-as-in-python-or-c-4-0/2429737#2429737
Wallacoloo
Sorry, cross editing.
Charles Merriam
+3  A: 

Someone actually asked a question here fairly recently that I answered with the Zip extension method, so it's obviously important for some people. ;)

Actually, it is a fairly important operation for mathematical algorithms - matrices, curve fitting, interpolation, pattern recognition, that sort of thing. Also very important in engineering applications like digital signal processing where much of what you do is combine multiple signals or apply linear transforms to them - both are based on the sample index, hence, zip it. Zipping two sequences is far, far faster than sorting and joining them based on some key, especially when you know in advance that the sequences have the same number of elements and are in the same order.

I can't get into tight specifics here on account of my current employment, but speaking generally, this is also valuable for telemetry data - industrial, scientific, that sort of thing. Often you'll have time sequences of data coming from hundreds or thousands of points - parallel sources - and you need to aggregate, but horizontally, over devices, not over time. At the end, you want another time sequence, but with the sum or average or some other aggregate of all the individual points.

It may sound like a simple sort/group/join in SQL Server (for example) but it's actually really hard to do efficiently this way. For one thing, the timestamps may not match exactly, but you don't care about differences of a few milliseconds, so you end up having to generate a surrogate key/row number and group on that - and of course, the surrogate row number is nothing more than the time index which you already had. Zipping is simple, fast, and infinitely parallelizable.

I don't know if I'd call it foundational, but it it is important. I don't use the Reverse method very often either, but by the same token I'm glad I don't have to keep writing it myself on those rare occasions when I do find a need for it.

One of the reasons it might not seem that useful to you now is that .NET/C# 3.5 does not have tuples. C# 4 does have tuples, and when you're working with tuples, zipping really is a fundamental operation because ordering is strictly enforced.

Aaronaught
Great, now I can't get "Song A" out of my head...
Ignacio Vazquez-Abrams
Hahaha, never mind, I just got it. :P
Aaronaught
It's not that I think it doesn't seem useful. I just want to know *how* or *why* it is useful.
Cheeso
@Cheeso: I thought I answered that - you asked about scenarios, I listed a bunch of use case examples and explained why zip joins are simpler and more efficient than any other kind. What information have I left out?
Aaronaught
you didn't leave anything out, but you wrote "one of the reasons it might not seem useful to you..." and I never said that or intended to imply it.
Cheeso
Oh, alright. I guess what I meant by that is "once you start using tuples to solve various problems, you will probably start finding many more specific uses for zip."
Aaronaught