ansaurus

Question

The memory usage reported by guppy differ from ps command

Answer 1

+2 A:

possibly due to swapping/memory reservation, based on ps's definition:

RSS: resident set size, the non-swapped physical memory
     that a task has used (in kiloBytes).

VSZ: virtual memory usage of entire process.
     vm_lib + vm_exe + vm_data + vm_stack

it can be a bit confusing, 4 different size metrics can be seen with:

# ps -eo pid,vsz,rss,sz,size,cmd|egrep python

PID    VSZ   RSS   SZ    SZ    CMD
23801  4920  2896  1230  1100  python

the virtual size includes memory that was reserved by the process and not used, the size of all shared libraries that were loaded, pages that are swapped out, and blocks that were already freed by your process, so it could be much larger than the size of all live objects in python.

some additional tools to investigate memory performance:

Heapy (part of Guppy, which you are using): http://guppy-pe.sourceforge.net/
Python Memory Validator http://www.softwareverify.com/python/memory/index.html
PySizer http://pysizer.8325.org/

good guide on tracking down memory leaks in python using pdb and objgraph:

http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks

jspcal 2010-01-14 18:07:43

But even I compare 69 MB with 10MB, it is just too far. What could be the problem? And what's more, the memory usage grows with time. At first, the RSS is about 2xMB, and it comes to 6x now.

Victor Lin 2010-01-14 18:10:17

some of the objects reported by guppy might be swapped out so python might report those, while rss wouldn't include it.. im not sure why the process is reserving so much memory tho, maybe tons of shared libs..?

jspcal 2010-01-14 18:15:54

All libraries are loaded at startup. It uses only 2x MBs in RSS field when the server started. It doesn't make sense that libraries occupy extra memory usage.Is there any way to know what's the those invisible memory usage?

Victor Lin 2010-01-14 18:42:53

hmm, sounds like it's leaking memory somewhere, can you try Python Memory Validator to pinpoint which code is allocating the most memory?

jspcal 2010-01-14 18:54:43

Answer 2

+2 A:

As pointed out above the RSS size is what you're most interested in here. The "Virtual" size includes mapped libraries, which you probably don't want to count.

It's been a while since I used heapy, but I am pretty sure the statistics it prints do not include overhead added by heapy itself. This overhead can be pretty significant (I've seen a 100MB RSS process grow another dozen or so MB, see http://www.pkgcore.org/trac/pkgcore/doc/dev-notes/heapy.rst ).

But in your case I suspect the problem is that you are using some C library that either leaks or uses memory in a way that heapy does not track. Heapy is aware of memory used directly by python objects, but if those objects wrap C objects that are separately allocated heapy is not normally aware of that memory at all. You may be able to add heapy support to your bindings (but if you do not control the bindings you use that is obviously a hassle, and even if you do control the bindings you may not be able to do this depending on what you are wrapping).

If there are leaks at the C level heapy will also lose track of that memory (RSS size will go up but heapy's reported size will stay the same). Valgrind is probably your best bet to track these down, just as it is in other C applications.

Finally: memory fragmentation will often cause your memory usage (as seen in top) to go up but not down (much). This is usually not that much of a problem with daemons, since the process will reuse this memory, it's just not released back to the os, so the values in top do not go back down. If memory usage (as seen by top) goes up more or less linearly with the number of users (connections), does not go back down, but also does not keep growing forever until you hit a new maximum number of users, fragmentation is probably to blame.

mzz 2010-01-29 18:42:24

ansaurus

tags:

views:

answers:

The memory usage reported by guppy differ from ps command

related questions