views:

140

answers:

2

I understand that in Python a string is simply an expression and a string by itself would be garbage collected immediately upon return of control to a code's caller, but...

  1. Large class/method doc strings in your code: do they waste memory by building the string objects up?
  2. Module level doc strings: are they stored infinitely by the interpreter?

Does this even matter? My only concern came from the idea that if I'm using a large framework like Django, or multiple large open source libraries, they tend to be very well documented with potentially multiple megabytes of text. In these cases are the doc strings loaded into memory for code that's used along the way, and then kept there, or is it collected immediately like normal strings?

+6  A: 
  • "I understand that in Python a string is simply an expression and a string by itself would be garbage collected immediately upon return of control to a code's caller" indicates a misunderstanding, I think. A docstring is evaluated once (not on every function call) and stays alive at least as long as the function does.

  • "Does this even matter?" when it comes to optimization is not answered by thinking about it abstractly but by measuring. "Multiple megabytes" of text isn't probably isn't a lot in a memory-intensive application. The solution for saving memory likely lives elsewhere and you can determine whether that is the case by measurement.

  • Python's -OO command line switch removes docstrings.

Mike Graham
@Mike Thanks. +1
orokusaki
+1  A: 

Python docstrings by default are kept around indefinitely, since they're accessible via the __doc__ attribute of a function or a module. For example, with the following in test.py:

"""This is a test module."""

def f():
   """This is a test function."""
   pass

Then:

$ python
Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11) 
[GCC 4.1.2 20070925 (Red Hat 4.1.2-33)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> test.__doc__
'This is a test module.'
>>> test.f.__doc__
'This is a test function.'
>>> 

The -OO option to the interpreter apparently causes it to remove docstrings from the generated .pyo files, but it doesn't have the effect I would expect:

$ python -OO
Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11) 
[GCC 4.1.2 20070925 (Red Hat 4.1.2-33)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> test.__file__
'/tmp/test.py'
>>> 
$ grep "This is a test" /tmp/test.pyo
Binary file /tmp/test.pyo matches
$ python -OO
Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11) 
[GCC 4.1.2 20070925 (Red Hat 4.1.2-33)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test
>>> test.__file__
'/tmp/test.pyo'
>>> test.__doc__
'This is a test module.'
>>> 

And in fact, the test.pyo file generated with -OO is identical to the test.pyc file generated with no command-line arguments. Can anyone explain this behavior?

jchl
On first glance I would say it's a bug in your Python 2.5.1. It works as expected on the 2.5.5 I have here. Python 2.5.1c1 had this bug, according to the NEWS file, but it should have been fixed in the actual 2.5.1 release. See http://bugs.python.org/issue1722485
Thomas Wouters