views:

312

answers:

5

I've been writing little Python programs at home to learn more about the language. The most recent feature I've tried to understand are List Comprehensions. I created a little script that estimates when my car needs its next oil change based on how frequently I've gotten the oil changed in the past. In the code snippet below, oil_changes is a list of the mileages at which I got the oil changed.

# Compute a list of the mileage differences between each oil change.
diffs = [j - i for i, j in zip(oil_changes[:-1], oil_changes[1:])]

# Use the average difference between oil changes to estimate the next change.
next_oil = oil_changes[-1] + sum(diffs) / len(diffs)

The code produces the right answer (did the math by hand to check) but it doesn't feel quite Pythonic yet. Am I doing a lot of needless copying of the original list in the first line? I feel like there's a much better way to do this but I don't know what it is.

+2  A: 

It seems fine, really. Not everything is simple (you've got several steps in an otherwise simple calculation, no matter how you frame it). There are options to reduce the copies, like using itertools.islice and itertools.izip, but (aside from izip) the extra steps in the code would just complicate it further. Not everything needs to be a list comprehension, but it is a judgement call sometimes. What looks cleaner to you? What will the next guy that reads it understand best? What will you understand when you come back to fix that bug in three months?

ironfroggy
+2  A: 

The itertools package provides additional generator-style functions. For instance, you can use izip in place of zip to save on some memory.

You could also perhaps write an average function so you can turn diffs into a generator instead of a list comprehension:

from itertools import izip

def average(items):
    sum, count = 0, 0

    for item in items:
        sum   += item
        count += 1

    return sum / count

diffs = (j - i for i, j in izip(oil_changes[:-1], oil_changes[1:])
next_oil = oil_changes[-1] + average(diffs)

Alternatively, you could change your definition of diffs to:

diffs = [oil_changes[i] - oil_changes[i-1] for i in xrange(1, len(oil_changes))]

I dunno, it's not really a huge improvement. Your code is pretty good as is.

John Kugelman
Interestingly, your alternate definition of diffs results in the fastest runtime among the answers here (excepting John Machin's answer of course).
Kristo
+9  A: 

Try this:

assert len(oil_changes) >= 2
sum_of_diffs = oil_changes[-1] - oil_changes[0]
number_of_diffs = len(oil_changes) - 1
average_diff = sum_of_diffs / float(number_of_diffs)
John Machin
This is clearly the best way to get my answer, but then I wouldn't be learning anything about List Comprehensions. +1 anyway. :-)
Kristo
Learning about technique X should include not using technique X when it's not warranted -- see Alex's Italian proverb. Note that the fact that only the first and last distances are used in the answer indicates that the predictive ability of the arithmetic average of the differences might not be so great. Here's a better example to try your techniques on: calculate an exponential moving average (more recent results have a greater weight than earlier ones) -- it can't be optimised into a one-liner.
John Machin
+1  A: 

Am I doing a lot of needless copying of the original list in the first line?

Technically, yes. Realistically, no. Unless you've changed your oil literally millions of times, the speed penalty is unlikely to be significant. You could change zip to izip, but it hardly seems worth it (and in python 3.0, zip effectively is izip).

Insert that old quote by Knuth here.

(you could also replace oil_changes[:-1] with just oil_changes, since zip() truncates to the length of the shortest input sequence anyway)

John Fouhy
+7  A: 

As other answers pointed out, you don't really need to worry unless your oil_changes list is extremely long. However, as a fan of "stream-based" computing, I think it's interesting to point out that itertools offers all the tools you need to compute your next_oil value in O(1) space (and O(N) time of course!-) no matter how big N, that is, len(next_oil), gets.

izip per se is insufficient, because it only reduces a bit the multiplicative constant but leaves your space demands as O(N). The key idea to bring those demands down to O(1) is to pair izip with tee -- and avoiding the list comprehension, which would be O(N) in space anyway, in favor of a good simple old-fashioned loop!-). Here comes:

  it = iter(oil_changes)
  a, b = itertools.tee(it)
  b.next()
  thesum = 0
  for thelen, (i, j) in enumerate(itertools.izip(a, b)):
    thesum += j - i
  last_one = j
  next_oil = last_one + thesum / (thelen + 1)

Instead of taking slices from the list, we take an iterator on it, tee it (making two independently advanceable clones thereof), and advance, once, one of the clones, b. tee takes space O(x) where x is the maximum absolute difference between the advancement of the various clones; here, the two clones' advancement only differs by 1 at most, so the space requirement is clearly O(1).

izip makes a one-at-a-time "zipping" of the two slightly-askew clone iterators, and we dress it up in enumerate so we can track how many times we go through the loop, i.e. the length of the iterable we're iterating on (we need the +1 in the final expression, because enumerate starts from 0!-). We compute the sum with a simple +=, which is fine for numbers (sum is even better, but it wouldn't track the length!-).

It's tempting after the loop to use last_one = a.next(), but that would not work because a is actually exhausted -- izip advances its argument iterables left to right, so it has advanced a one last time before it realizes b is over!-). That's OK, because Python loop variables are NOT limited in scope to the loop itself -- after the loop, j still has the value that was last extracted by advancing b before izip gave up (just like thelen still has the last count value returned by enumerate). I'm still naming the value last_one rather than using j directly in the final expression, because I think it's clearer and more readable.

So there it is -- I hope it was instructive!-) -- although for the solution of the specific problem that you posed this time, it's almost certain to be overkill. We Italians have an ancient proverb -- "Impara l'Arte, e mettila da parte!"... "Learn the Art, and then set it aside" -- which I think is quite applicable here: it's a good thing to learn advanced and sophisticated ways to solve very hard problems, in case you ever meet them, but for all that you need to go for simplicity and directness in the vastly more common case of simple, ordinary problems -- not apply advanced solutions that most likely won't be needed!-)

Alex Martelli
There's always a tradeoff it seems. Your code is the slowest among the answers here according to timeit.
Kristo
For short-ish lists, it may be; try the various approaches with a few million items in oil_changes, say...;-)
Alex Martelli
It may be worth noting that the construction tee+next+izip is usually abstracted as pairwise(), see itertools docs for details. On the other hand, using the for-variable outside the loop (though accepted both by the interpreter and the Python community) is IMHO pretty ugly. That said, a more functional (in the FP sense) solution would be harder for a novice to understand and probably not regarded as "Pythonic" (as it would use the so-hated reduce/foldl).
tokland