tags:

views:

130

answers:

4

I've been running Python scripts that make several calls to some functions, say F1(x) and F2(x), that look a bit like this:

x = LoadData()

for j in range(N):
    y = F1(x[j])
    z[j] = F2(y)

    del y

SaveData(z)

Performance is a lot faster if I keep the "del y" line. But I don't understand why this is true. If I don't use "del y", then I quickly run out of RAM and have to resort to virtual memory, and everything slows to a crawl. Buy if I use "del y", then I am repeatedly flushing and re-allocating the memory for y. What I would like to do is have y sit as static memory, and reuse the memory on every F1(x) call. But from what I can tell, that isn't what's happening.

Also, not sure if it's relevant, but my data consists of numpy arrays.

A: 

For very large values of N use xrange instead of range for memory save. Also you can nest functions but I don't know if this will help you. : \

x = LoadData()

for j in xrange(N):
    z[j] = F2(F1(x[j]))

SaveData(z)

Maybe F1 and F2 are making unnecessary copies of objects, the best way would be in-place, something like:

x = LoadData()
for item in x:
    item.F1()
    item.F2()
SaveData(x)

Sorry if may answer is not helpful

razpeitia
I worried that my example would permit nesting; it isn't as practical an option in the actual script. I think gnibbler's feedback correctly describes the situation, but thanks for your feedback. I was not familiar with the xrange function until you pointed it out.
Marshall Ward
+4  A: 

Without the del y you might need twice as much memory. This is because for each pass through the loop, y is bound to the previous value of F1 while the next one is calculated.

once F1 returns y is rebound to that new value and the old F1 result can be released.

This would mean that the object returned by F1 occupies quite a lot of memory

Unrolling the loop for the first couple of iterations would look like this

y = F1(x[0])   # F1(x[0]) is calculated, then y is bound to it
z[j] = F2(y)
y = F1(x[1])   # y is still bound to F1(x[0]) while F1(x[1]) is computed
               # The memory for F1(X[0]) is finally freed when y is rebound
z[j] = F2(y)

using del y is a good solution if this is what is happening in your case.

gnibbler
hear hear -- this totally explains why you have better performance with `del y`
Igor
Thanks, this must be exactly what is happening: Without the del, my memory usage doubles, spills a bit into virtual memory (since my RAM happens to lie between one and two instances of y) and the script slogs on at low performance.I'll stick with the del solution for now; thank you again for explaining when/how the instances of y are created.
Marshall Ward
+1  A: 

what you actually want is something that's weird to do in python -- you want to allocate a region of memory for y and pass the pointer to that region to F1() so it can use that region to build up the next value of y. this avoid having F1() do it's own allocation for the new value of y, the reference to which is then written into your own variable y (which is actually not the value of whatever F1() calculated but a reference to it)

There's already an SO question about passing by reference in python: http://stackoverflow.com/questions/986006/python-how-do-i-pass-a-variable-by-reference

Igor
A: 

What python implementation do you use? I have Python 2.6.5 (CPython) and everything works fine:

>>> class Y:
...     def __del__(self):
...             print 'del y'
... 
>>> for i in xrange(5):
...     print 'create y'
...     y = Y()
... 
create y
create y
del y
create y
del y
create y
del y
create y
del y

>>> for i in xrange(10):
...     print 'create y'
...     y = Y()
...     del y
... 
create y
del y
create y
del y
create y
del y
create y
del y
create y
del y

The only difference is that first solution allocate one more variable (witch shouldn't be a problem).

Tomasz Wysocki
My variables are many GB in size, so in this case the duplicate is actually quite a big problem! Thanks for looking into it though.
Marshall Ward