ansaurus

Question

Efficient way to use python's lambda, map

Answer 1

A:

Although I don't get why this should be more efficient, I am pretty sure a for loop will give the best performance:

l = [start]
for i in diff:
    l.append(l[-1] + i)

bayer 2009-09-01 16:35:16

sudhakar 2009-09-01 16:39:45

List comprehensions are usually faster than for loops

fsanches 2009-09-01 16:51:17

Care to elaborate how you would construct this list comprehension? I can't think of one that's elegant and does not create a new list every time.

bayer 2009-09-01 19:52:48

mshsayem made a simple one , but it seems he deleted his answer.Here's my take (not as good as his one):[start.append(start[-1]+x) for x in diff]Anyway, map is also faster than for loops (according to Lutz's book again) if you don't use lambdas.

fsanches 2009-09-02 02:14:34

Map comprehension's are only faster, if you use them construct a list. Here, you are using them to do some function calls (and construct a list containing Nones as a side effect).

bayer 2009-09-02 07:39:29

@fsanches list comprehension creates an additional list (apart from the start list to which values are appended)

sudhakar 2009-09-03 08:13:46

Answer 2

+6 A:

The following works for me:

orig = [start]
for x in diff:
    orig.append(orig[-1] + x)

Using map will create an new array of the same size, filled with None. I also find a simple for loop more readable, and in this case as fast as you can get.

spatz 2009-09-01 16:35:21

sudhakar 2009-09-01 16:42:23

@sudhakar: In Python, lambda is very inefficient -- Function calls are slow, and calls to anonymous functions even slower.

John Millikin 2009-09-01 16:45:02

@fsanches: list comprehensions are faster for some cases, but in this one (building a list using previous values) would be significantly slower because it would require intermediate lists to be created.

John Millikin 2009-09-01 16:56:13

In terms of speed, according to Mark Lutz's book Learning Python: list comprehension > map > for loop

fsanches 2009-09-01 16:58:42

@sudhakar map is faster than for loops, but lambdas aren't, as John stated.

fsanches 2009-09-01 17:02:21

Thanks spatz/John. I will use use ur solution, unless someone comes up with more efficient one :)

sudhakar 2009-09-01 17:43:26

Answer 3

+2 A:

Several of the other respondents have reasonable implementations of the algorithm you asked for, but I'm unclear on exactly what problem it is you're really trying to solve.

Unless the numbers being stored are very large (i.e., overflow an integer and require bignums), your list of diffs won't gain you any efficiency -- an integer is an integer from the Python runtime POV, so your example "diff" list of [-1, -1, 1, 2] will consume just as much memory as the original list [1005, 1004, 1003, 1004, 1006].

rcoder 2009-09-01 16:41:09

He noted he's storing the values in a DB. I assumed he's using a small data type, such as byte, in the DB to store a large list of large integers, which *is* more efficient.

spatz 2009-09-01 16:52:11

He's using the `array` type, presumably with 8- or 16-bit values.

John Millikin 2009-09-01 16:53:39

Yes I am using array type with 8 bit unsigned chartickvalue = ArrayProperty('b', default=None)

sudhakar 2009-09-01 17:25:59

The main reason why I am using diff is that, appengine has 1 mb limit. Using the orignal list, I can store upto 65k*13 items in a single record, b4 I reach the 1 mb limit

sudhakar 2009-09-01 17:29:40

Answer 4

+6 A:

For such large data structures numpy will work well. For this example, it's over 200x faster (see below), and a bit easier to code, basically just

add.accumulate(diff)

Comparison between numpy and direct list manipulation:

import numpy as nx
import timeit

N = 10000

diff_nx = nx.zeros(N, dtype=nx.int)
diff_py = list(diff_nx)

start = 1005

def f0():
    orig = [start]
    for x in diff_py: 
        orig.append(orig[-1] + x)

def f1():
    diff_nx[0] = start
    nx.add.accumulate(diff_nx)

t = timeit.Timer("f0()", "from __main__ import f0, f1, diff_nx, diff_py, nx, start")
print t.timeit(number=1000)
t = timeit.Timer("f1()", "from __main__ import f0, f1, diff_nx, diff_py, nx, start")
print t.timeit(number=1000)

gives

13.4044158459     # for list looping
0.0474112033844   # for numpy accumulate

Really, though, it seems better to reuse an established compression algorithm, like can easily be done with PyTables, rather than rolling your own like it seems that you're doing here.

Also, here, I'm suggesting that you read in the data with room for the prepended start term, rather than rebuild the list with the prepended term, of course, so you don't have to do the copy.

tom10 2009-09-01 16:41:17

+1 I've been looking for an accumulate function!

chrispy 2009-09-01 17:06:13

Unfortunately Appengine is yet to support numpy :(

sudhakar 2009-09-01 17:27:20

Answer 5

+3 A:

Perfect for generators:

def diff2abs( diffs, start ):
    yield start
    for diff in diffs:
        start += diff
        yield start

start = 1005
diffs = [-1, -1, 1, 2]
original_list = list( diff2abs( diffs, start ))

THC4k 2009-09-01 16:41:40

+1 for using generators

sudhakar 2009-09-01 17:40:51

Answer 6

+2 A:

class runningtotal:
    def __init__(self, start = 0):
        self.total = start
    def __call__(self, value):
        self.total += value
        return self.total

Now try:

>>> map(runningtotal(start), [0,]+diff)
[1005, 1004, 1003, 1004, 1006]

chrispy 2009-09-01 16:41:43

+1 for using different approach

sudhakar 2009-09-01 18:19:44

Answer 7

+1 A:

As mshsayem suggested, use list comprehensions - they are generally faster than for loops or map/lambdas (according do Mark Lutz's book Learning Python).

If you really want to use an more FP-ish solution, the proper function would be "scan", wich [I believe] isn't implemented in Python so you would have to implement it yourself (which is not a hard task).

"scan" is basically a reduce, but instead of reducing the list to a single value, it stores the result of each "iteration" in a new list.

If you implemented it, you could do something like:

scan(lambda x,y: x+y, [start]++diff)

fsanches 2009-09-01 16:46:16

You're right, there is no built-in `scan` in Python.

Mark Rushakoff 2009-09-01 16:48:29

I really like this scan method. Let me see if i can come up with a good implementation. +1 for different approach

sudhakar 2009-09-01 17:37:44

List comprehensions are not always faster - especially if (as in your example) for every iteration a new list is created. In that case, runtime goes up to n^2.

bayer 2009-09-01 19:51:55

@usualyuseless: My example isn't a list comprehension and is creating only one list.

fsanches 2009-09-02 02:18:53

Correct, My fault. However, the key point is still valid. You create a new list for every step.

bayer 2009-09-02 07:40:43

No, it creates only one list if you use mutable lists (and there's no reason you wouldn't use them here).

fsanches 2009-09-03 04:16:31

Answer 8

A:

I don't know about your reasoning for storing the integers as diffs -- rcoder gave a good answer about why this generally is not more efficient than storing the integers themselves -- but if you don't need to have access to the entire list at once, it's more efficient memory-wise for you to use a generator. Since you say this is a "big list," you can save a lot of memory this way, instead of allocating the entire list at once. Here's a generator comprehension to get your list back:

start = 1005
def mod_start(x):
    global start
    start += x
    return start
int_generator = (mod_start(i) for i in diffs)

You can then iterate over int_generator like you would any list, without having the entire list in memory at once. Note, however, that you cannot subscript or slice a generator, but you can use it in many useful situations.

You can clean up the example so that the start variable does not need to be global. It just can't be local to the mod_start function.

Edit: You don't have to use the generator comprehension to get a generator. You can also use a generator function with the yield expression, like THC4k did. That avoids the start variable scope issue and is probably a little cleaner. You can also get a list from a generator at any time by passing it to the list() built-in function.

Jeff 2009-09-01 17:13:30

ansaurus

tags:

views:

answers:

Efficient way to use python's lambda, map

related questions