views:

294

answers:

4

Okay, so I've got a piece of Python code which really needs optimizing.

I need to iterate over every single pixel of a small (80x60) image and extract the RGB values from it. The code in the loop itself isn't too slow, but I'm doing it using nested for loops, which I assume add quite a bit of overhead...

xr = xrange(80)
yr = xrange(60)
get_at = surface.get_at()
set_at = surface.set_at()

for x in xr:
    for y in yr:
        pixelR = get_at((x,y))[0]
        pixelG = get_at((x,y))[1]
        pixelB = get_at((x,y))[2]
        # ... more complex stuff here which changes R,G,B values independently of each other
        set_at((x,y),(pixelR,pixelG,pixelB))

It really doesn't seem like the right solution at all... I'd rather swap out those for loops for the faster map() c function, but if I do that I can't figure out how I can get the x,y values, nor the local values defined out of the scope of the functions I'd need to define.

Would using map() be any faster than this current set of for loops? How could I use it and still get x,y?

P.S I'm using pygame surfaces, and I've tried the surfarray/pixelarray modules, but since I'm changing/getting every pixel, it's a lot slower than get_at() and set_at().

Also, slightly irrelevant... do you think this could be made quicker if Python wasn't traversing a list of numbers but just incrementing a number, like in other languages? Why doesn't python include a normal for() as well as their foreach()?

I suppose I should've posted the whole function, here:

# xr, yr = xrange(80), xrange(60)
def live(surface,xr,yr):
    randint = random.randint
    set_at = surface.set_at
    get_at = surface.get_at
    perfect = perfectNeighbours #
    minN = minNeighbours        # All global variables that're defined in a config file.
    maxN = maxNeighbours        #
    pos = actual                # actual = (80,60)
    n = []
    append = n.append
    NEIGHBOURS = 0

    for y in yr: # going height-first for aesthetic reasons.
        decay = randint(1,maxDecay)
        growth = randint(1,maxGrowth)

        for x in xr:
            r, g, b, a = get_at((x,y))

            del n[:]
            NEIGHBOURS = 0

            if x>0 and y>0 and x<pos[0]-1 and y<pos[1]-1:
                append(get_at((x-1,y-1))[1])
                append(get_at((x+1,y-1))[1])
                append(get_at((x,y-1))[1])
                append(get_at((x-1,y))[1])
                append(get_at((x+1,y))[1])
                append(get_at((x-1,y+1))[1])
                append(get_at((x+1,y+1))[1])
                append(get_at((x,y+1))[1])
                for a in n:
                    if a > 63:
                        NEIGHBOURS += 1

            if NEIGHBOURS == 0 and (r,g,b) == (0,0,0): pass
            else:

                if NEIGHBOURS < minN or NEIGHBOURS > maxN:
                    g = 0
                    b = 0
                elif NEIGHBOURS==perfect:
                    g += growth
                    if g > 255:
                        g = 255
                        b += growth
                        if b > growth: b = growth
                else:
                    if g > 10: r = g-10
                    if g > 200: b = g-100
                    if r > growth: g = r
                    g -= decay
                    if g < 0:
                        g = 0
                        b = 0
                r -= 1
                if r < 0:
                    r = 0
                set_at((x,y),(r,g,b))

The amount of conditionals there probably makes things slower too, right? The slowest part is checking for neighbours (where it builds the list n)... I replaced that whole bit with slice access on a 2D array but it doesn't work properly.

+3  A: 

What's making your code slow is probably not the loops, they are incredibly fast.

What slows done your code are the number of function calls. For example

pixelR = get_at((x,y))[0]
pixelG = get_at((x,y))[1]
pixelB = get_at((x,y))[2]

is a lot slower than (about 3 times I guess)

r, g, b, a = get_at((x,y))

Every get_at, set_at call locks the surface, therefore it's faster to directly access the pixels using the available methods. The one that seems most reasonable is Surface.get_buffer.

Using map doesn't work in your example, because you need the indexes. With as few as 80 and 60 numbers it might even be faster to use range() instead of xrange().

Georg
Thanks. I think you're going to freak when you see what else I'm doing in the inner loop ;P I'm making a list of the 8 pixels neighbouring the current pixel using get_at 8 times, and then iterating that list checking to see if any of those pixels' green value is higher than 64.I should probably replace this. I had major problems with using pixelarrays and surfarrays in pygame tho. I couldn't get it slicing right :\I looked at Surface.get_buffer() btw, thanks. There's very little documentation on how to implement it, though. It says to just use pixel/surfarray instead.
Owain Jones
You might be able to use `help()` in interactive mode to get more information about the PixelBuffer object returned by get_buffer().
Georg
+1  A: 
map(do_stuff, ((x, y) for x in xrange(80) for y in xrange(60)))

where do_stuff would presumably be defined like so:

def do_stuff(coords):
    r, g, b, a = get_at(coords)
    # ... whatever you need to do with those ...
    set_at(coords, (r, g, b))

You could alternatively use a list comprehension instead of a generator expression as the second argument to map (replace ((x, y) ...) with [(x, y) ...]) and use range instead of xrange. I'd say that it's not very likely to have a significant effect on performance, though.

Edit: Note that gs is certainly right about the for loops not being the main thing in need of optimisation in your code... Cutting down on superfluous calls to get_at is more important. In fact, I'm not sure if replacing the loops with map will actually improve performance here at all... Having said that, I find the map version more readable (perhaps because of my FP background...), so here you go anyway. ;-)

Michał Marczyk
Thanks for the loop solution... I didn't know you could use list comprehension inside a map. My mind is blown.And yeah, again, thanks for showing it's probably my get_at() and set_at() calls instead.
Owain Jones
I'd even say that using map like this is is going to degrade performance. I haven't done any testing, but you've got one more function call in there.
Georg
@Owain: Glad you like it. :-) @gs: The cost of the extra call to map is virtually unobservable and the looping inside map is super fast. The map version requires all (x, y) pairs to be wrapped in tuples, which would be terrible from a performance standpoint if you unpacked the tuple first thing in do_stuff (FP people would call this unnecessary consing), but here it is the tuple that's needed for get_at / set_at anyway, so it's ok. There is a gotcha here, though: map will actually build up a list of results, which is not memory efficient. If this matters, itertools.imap is better.
Michał Marczyk
Of course, as soon as you're using itertools.imap, you're going right back to using for loops, so... :-)
Michał Marczyk
+1  A: 

A list comprehension is the idiomatic way to do map. And as other posters have said, the looping is not the first thing you should be looking at. But when in doubt, timeit.

import timeit

setup_statement = "def foo(coords): return"
for statement in [
    '[foo((x, y)) for x in xrange(80) for y in xrange(60)]',
    '(foo((x, y)) for x in xrange(80) for y in xrange(60))',
    'map(foo, [(x, y) for x in xrange(80) for y in xrange(60)])',
    'map(foo, ((x, y) for x in xrange(80) for y in xrange(60)))',
    'for x in xrange(80):\n for y in xrange(60):\n  foo((x,y))',
    ]:
    print timeit.Timer(statement, setup_statement).timeit(10000)

Running this once:

[karl@node trunk]$ python /tmp/foo.py
16.8661289215
0.00898885726929
20.3232069016
20.5091159344
14.3258991241

On this run, a generator is faster than a for loop, a for loop is faster than a list comprehension, and there's no reason to map().

Karl Anderson
Note that your time for the second test case is suspiciously short. That is the time to create a generator expression object, but not to actually do any iterations with it. If you wrap that in `list()` you would force it to iterate over the generator values, but you would also force it to build a list; probably the fastest way to force it to iterate would be to wrap it in `for x in genexp: pass`.
steveha
The fastest time to actually do the loop looks like the plain old `for` statement, as I would expect. The list comprehension has to build a list. The map has to build a list too. Here we just want all the calls to `foo()` so the `for` loop is best.
steveha
for the OP i would expect the explicit for loop to win by a significantly wider margin because it means you don't have to put all the inside-the-loop code in a function. for the map examples (and probably the list comprehension as well since it is limited to a single expression) you end up with x*y function calls that you just don't need in the for loop version. python function calls are a significant source of overhead.
teepark
+1  A: 

Since you are reading and rewriting every pixel, I think you can get the best speed improvement by not using a Surface.

I suggest first taking your 80x60 image and converting it to a plain bitmap file with 32-bit pixels. Then read the pixel data into a python array object. Now you can walk over the array object, reading values, calculating new values, and poking the new values into place with maximum speed. When done, save your new bitmap image, and then convert it to a Surface.

You could also use 24-bit pixels, but that should be slower. 32-bit pixels means one pixel is one 32-bit integer value, which makes the array of pixels much easier to index. 24-bit packed pixels means each pixel is 3 bytes, which is much more annoying to index into.

I believe you will gain much more speed out of this approach than by trying to avoid the use of for. If you try this, please post something here to let us know how well it worked or didn't. Good luck.

EDIT: I thought that an array has only a single index. I'm not sure how you managed to get two indexes to work. I was expecting you to do something like this:

def __i(x, y):
    assert(0 <= x < 80)
    assert(0 <= y < 60)
    i = (y*80 + x) * 4
    return i
def red(x, y):
    return __a[__i(x, y)]
def green(x, y):
    return __a[__i(x, y) + 1]
def blue(x, y):
    return __a[__i(x, y) + 2]
def rgb(x, y):
    i = __i(x, y)
    return __a[i], __a[i + 1], __a[i + 2]
def set_rgb(x, y, r, g, b):
    i = __i(x, y)
    _a[i] = r
    _a[i + 1] = g
    _a[i + 2] = b

# example:
r, g, b = rgb(23, 33)

Since a Python array can only hold a single type, you will want to set the type to "unsigned byte" and then index like I showed.

Where of course __a is the actual array variable.

If none of this is helpful, try converting your bitmap into a list, or perhaps three lists. You can use nested lists to get 2D addressing.

I hope this helps. If it is not helpful, then I am not understanding what you are doing; if you explain more I'll try to improve the answer.

steveha
In theory using arrays worked, and the code's running faster, but obviously accessing a 2D array like array[x,y] = Color(a,r,g,b) isn't the same as set_at((x,y),Color(...)). It only seems to work in a diagonal line (which is weird because the exact same code works with set_at so it's not the x,y values. Trying to get slices of the array doesn't seem to work either - array[x-1:x+1,y-1:y+1] returns a 2D array that's only got 4 elements in total in it, when it should be 9.So; in theory what you said about copying the image into an array does indeed speed it up a lot, but pygame != cooperating.
Owain Jones
How did you manage to get 2D indexing to work with an `array`? I thought that `array` is like `list`, and you just get a single index. Hmm, I want to write code, so please see the answer.
steveha
Wait a minute... did you use the basic array from Python, or the advanced scientific array from NumPy/SciPy? Because the scientific one can have more than one index. I was suggesting the basic array type, which should be very fast, faster than a plain Python list. See here: http://docs.python.org/library/array.html
steveha
Ah. PyGame's surfarray = a numpy multidimensional array. I was using the surfarray module to create arrays from the surface because it seemed quicker than converting the surface to a bitmap-then-array by hand, but I see how 1D arrays would be faster.
Owain Jones
If you want maximum speed of reading/writing the data, I think a Python `array` is your best bet. It will take time to set up a NumPy array, and time to convert the values from the NumPy array back into something usable as a bitmap; whereas the Python `array` should be able to give you a direct view into the guts of a bitmap .BMP file, and you can walk over it and read pixels and do your math, and write to it and you are directly updating the bitmap pixels.
steveha