views:

267

answers:

5

I have been working on some code. My usual approach is to first solve all of the pieces of the problem, creating the loops and other pieces of code I need as I work through the problem and then if I expect to reuse the code I go back through it and group the parts of code together that I think should be grouped to create functions.

I have just noticed that creating functions and calling them seems to be much more efficient than writing lines of code and deleting containers as I am finished with them.

for example:

def someFunction(aList):
    do things to aList
    that create a dictionary
    return aDict

seems to release more memory at the end than

>>do things to alist
>>that create a dictionary
>>del(aList)

Is this expected behavior?

EDIT added example code

When this function finishes running the PF Usage shows an increase of about 100 mb the filingsList has about 8 million lines.

def getAllCIKS(filingList):
    cikDICT=defaultdict(int)
    for filing in filingList:
        if filing.startswith('.'):
            del(filing)
            continue
        cik=filing.split('^')[0].strip()
        cikDICT[cik]+=1
        del(filing)
    ciklist=cikDICT.keys()
    ciklist.sort()
return ciklist

allCIKS=getAllCIKS(open(r'c:\filinglist.txt').readlines())

If I run this instead I show an increase of almost 400 mb

cikDICT=defaultdict(int)
for filing in open(r'c:\filinglist.txt').readlines():
    if filing.startswith('.'):
        del(filing)
        continue
    cik=filing.split('^')[0].strip()
    cikDICT[cik]+=1
    del(filing)

ciklist=cikDICT.keys()
ciklist.sort()
del(cikDICT)

EDIT I have been playing around with this some more today. My observation and question should be refined a bit since my focus has been on the PF Usage. Unfortunately I can only poke at this between my other tasks. However I am starting to wonder about references versus copies. If I create a dictionary from a list does the dictionary container hold a copy of the values that came from the list or do they hold references to the values in the list? My bet is that the values are copied instead of referenced.

Another thing I noticed is that items in the GC list were items from containers that were deleted. Does that make sense? Soo I have a list and suppose each of the items in the list was [(aTuple),anInteger,[another list]]. When I started learning about how to manipulate the gc objects and inspect them I found those objects in the gc even though the list had been forcefully deleted and even though I passed the 0,1 & 2 value to the method that I don't remember to try to still delete them.

I appreciate the insights people have been sharing. Unfortunately I am always interested in figuring out how things work under the hood.

A: 

Some extra memory is freed when you return from a function, but that's exactly as much extra memory as was allocated to call the function in the first place. In any case - if you seeing a large amount of difference, that's likely an artifact of the state of the runtime, and is not something you should really be worrying about. If you are running low on memory, the way to solve the problem is to keep more data on disk using things like b-trees (or just use a database), or use algorithms that use less memory. Also, keep an eye out for making unnecessary copies of large data structures.

The real memory savings in creating functions is in your short-term memory. By moving something into a function, you reduce the amount of detail you need to remember by encapsulating part of the minutia away.

Eclipse
You are saying is that the memory freed is simply the memory required to handle the lines of code in the function, that doesn't seem right. I am thinking gc is handled differently with functions
PyNEwbie
I'm saying it's not worth worrying about. Let the gc do its job - the exact timing of freeing memory isn't really that important. If you run low on memory and you have freeable memory, it'll get freed.
Eclipse
+1  A: 

You can use the Python garbage collector interface provided to more closely examine what (if anything) is being left around in the second case. Specifically, you may want to check out gc.get_objects() to see what is left uncollected, or gc.garbage to see if you have any reference cycles.

Joe
Yes, I have played with those . Its a little complicated to explain but I believe my poking at it suggests that gc is better when you are using a function than submitting lines of code.
PyNEwbie
+3  A: 

Maybe you used some local variables in your function, which are implicitly released by reference counting at the end of the function, while they are not released at the end of your code segment?

J S
This would be my guess as well, without seeing any actual code.
John Y
A: 

Maybe you should re-engineer your code to get rid of unnecessary variables (that may not be freed instantly)... how about the following snippet?

myfile = file(r"c:\fillinglist.txt")
ciklist = sorted(set(x.split("^")[0].strip() for x in myfile if not x.startswith(".")))

EDIT: I don't know why this answer was voted negative... Maybe because it's short? Or maybe because the dude who voted was unable to understand how this one-liner does the same that the code in the question without creating unnecessary temporal containers?

Sigh...

fortran
A: 

I asked another question about copying lists and the answers, particularly the answer directing me to look at deepcopy caused me to think about some dictionary behavior. The problem I was experiencing had to do with the fact that the original list is never garbage collected because the dictionary maintains references to the list. I need to use the information about weakref in the Python Docs.

objects referenced by dictionaries seem to stay alive. I think (but am not sure) the process of pushing the dictionary out of the function forces the copy process and kills the object. This is not complete I need to do some more research.

PyNEwbie