I'm having some memory issues while using a python script to issue a large solr query. I'm using the solrpy library to interface with the solr server. The query returns approximately 80,000 records. Immediately after issuing the query the python memory footprint as viewed through top balloons to ~190MB.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8225 root 16 0 193m 189m 3272 S 0.0 11.2 0:11.31 python
...
At this point, the heap profile as viewed through heapy looks like this:
Partition of a set of 163934 objects. Total size = 14157888 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 80472 49 7401384 52 7401384 52 unicode
1 44923 27 3315928 23 10717312 76 str
...
The unicode objects represent the unique identifiers of the records from the query. One thing to note is that the total heap size is only 14MB while python is occupying 190MB of physical memory. Once the variable storing the query results falls out of scope, the heap profile correctly reflects the garbage collection:
Partition of a set of 83586 objects. Total size = 6437744 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 44928 54 3316108 52 3316108 52 str
However, the memory footprint remains unchanged:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8225 root 16 0 195m 192m 3432 S 0.0 11.3 0:13.46 python
...
Why is there such a large disparity between python's physical memory footprint and the size of the python heap?