views:

71

answers:

1

I am using Cherrypy in a RESTful web service and server returns XML as a result (lxml is being used to create XML). Some of those XMLs are quite large. I have noticed that memory is not being released after such request (that return large XML) has been processed.

So, I have isolated a problem and created this one very short dummy example:

import cherrypy
from lxml import etree

class Server:
    @cherrypy.expose
    def index(self):
        foo = etree.Element('foo')
        for i in range(200000):
            bar = etree.SubElement(foo, 'bar')
            bar1 = etree.SubElement(bar, 'bar1')
            bar1.text = "this is bar1 text ({0})".format(i)
            bar2 = etree.SubElement(bar, 'bar2')
            bar2.text = "this is bar2 text ({0})".format(i)
            bar3 = etree.SubElement(bar, 'bar3')
            bar3.text = "this is bar3 text ({0})".format(i)
            bar4 = etree.SubElement(bar, 'bar4')
            bar4.text = "this is bar4 text ({0})".format(i)
            bar5 = etree.SubElement(bar, 'bar5')
            bar5.text = "this is bar5 text ({0})".format(i)

        return etree.tostring(foo, pretty_print=True)

if __name__ == '__main__':
    cherrypy.quickstart(Server())

After request has been made to: http://localhost:8080/index, memory consumption goes from 830MB to 1.2GB. Then, after request has been processed it goes down to 1.1GB and stays there until the server is shut down. After server shut down, memory consumption goes down to 830MB.

In my project, data (of course) comes from the database, and parameters are being used to specify what data should be retrieved. If the same request (with same parameters) is made, memory stays at 1.1GB, i.e. no additional memory is being used. But, if different parameters are being passed, server keeps consuming more and more memory. Only way to free the memory is to restart the server.

Do you have any ideas on why this is happening and how to solve it? Thanks.

+1  A: 

This is a generic Python problem, not really a CherryPy one per se. effbot has a great answer to this question at http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm

And there's a similar SO question with a great answer at http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python

fumanchu
I see. Thanks a lot for the answer. What happens when memory consumption comes close to physical memory limit and new request comes in? Will Python know how to reuse memory that process is holding but not using? What will happen to other processes that are in demand of memory? Should I consider using processes instead of threads here? With time this server will occupy more and more memory (as different request will come). Is there a moment when Python will start reusing this occupied (but not used memory) or it will just run our of memory and start using swap?
kevin
Yes, Python will reuse memory. Other processes that demand memory will compete for it, moderated by the OS usually. Look out for the "OOM killer" in Linux OS's. Threads are a device to share and therefore *reduce* memory consumption compared to processes. And finally, yes, swap is used all the time. Read your OS docs.
fumanchu