views:

92

answers:

2

Just for the sheer heck of it, I've decided to create a Scheme binding to libpython so you can embed Python in Scheme programs. I'm already able to call into Python's C API, but I haven't really thought about memory management.

The way mzscheme's FFI works is that I can call a function, and if that function returns a pointer to a PyObject, then I can have it automatically increment the reference count. Then, I can register a finalizer that will decrement the reference count when the Scheme object gets garbage collected. I've looked at the documentation for reference counting, and don't see any problems with this at first glance (although it may be sub-optimal in some cases). Are there any gotchas I'm missing?

Also, I'm having trouble making heads or tails of the cyclic garbage collector documentation. What things will I need to bear in mind here? In particular, how do I make Python aware that I have a reference to something so it doesn't collect it while I'm still using it?

+1  A: 

Your link to http://docs.python.org/extending/extending.html#reference-counts is the right place. The Extending and Embedding and Python/C API sections of the documentation are the ones that will explain how to use the C API.

Reference counting is one of the annoying parts of using the C API. The main gotcha is keeping everything straight: Depending on the API function you call, you may or may not own the reference to the object you get. Be careful to understand whether you own it (and thus cannot forget to DECREF it or give it to something that will steal it) or are borrowing it (and must INCREF it to keep it and possibly to use it during your function). The most common bugs involving this are 1) remembering incorrectly whether you own a reference returned by a particular function and 2) believing you're safe to borrow a reference for a longer time than you are.

You do not have to do anything special for the cyclic garbage collector. It's just there to patch up a flaw in reference counting and doesn't require direct access.

Mike Graham
So ... Python uses reference counting *and* a garbage collector for cyclic structure? That's a pretty major flaw. The design kind.In any case, this sounds like it will make things much more "fun" for Jason, if any values that participate in a python-side cycle are exposed to scheme.
Eli Barzilay
Good info. As long as I INCREF everything when I get it and DECREF everything when I'm done with it, should I be OK? Or are there any problems I could run into?
Jason Baker
@Jason, only INCREF *borrowed* references. Some functions return *new* references that are already INCREF'ed. INCREF'ing them would result in a memory leak.
Virgil Dupras
@Eli - I think there are people that would debate that, but I would agree with you. The main advantage of reference counting is that it is more RAII-like (destructors get called pretty quickly upon finalization). But nobody uses destructors because they inhibit garbage collection.
Jason Baker
@Eli, The principal strategy for killing objects in CPython is refcounting. Since refcounting will not pick up on otherwise inaccessible reference cycles, it is augmented with an (optional, on-by-default) cyclic garbage collector. This augmentation is necessary to prevent memory leaks in any refcounted system that allows arbitrary references. This is only a design flaw insofar as refcounting is a design flaw (which some people would claim, of course).
Mike Graham
@Jason, also, people do not define `__del__` because typically-immediate-collection is an implementation detail you shouldn't rely on; it is different for different implementations of the Python language and can change in future versions of CPython.
Mike Graham
@Jason, no, that's the rub! Sometimes you borrow a reference (and need to INCREF it to own a reference to the object) and sometimes you steal a reference by calling a function and already own it; INCREFing could introduce a memory leak in this case. Similarly, when and if to DECREF depends on whether some function has stolen your reference, which happens. *Different functions give or loan the references they return and steal or borrow the references they receive.* Remembering what the function you're using does is why this can be a source of gotchas.
Mike Graham
@Mike - Regardless, the point is that nobody uses them. :-)
Jason Baker
@Jason, http://docs.python.org/c-api/intro.html#reference-count-details discusses what I was talking about in my last comment.
Mike Graham
Mike, Jason: yes, refcounting being a design flaw is exactly what I was getting at... Besides being the exact same hassle that GCs were designed to relieve you from (well, refcounting is making the hassle more organized, but looking at these comments and the various terms of owning, borrowing, etc, it doesn't help that much). Anyway, besides all of that, a good GC also has an advantage of dealing with large chunks of memory, which can in some cases even reduce the overall runtime cost.
Eli Barzilay
@Eli, you didn't phrase it clearly to me; I thought you were claiming specifically that augmenting refcounting with a cyclefinder was a design problem. Whether you like refcouting is a lot bigger an issue, and more a religious one than a technical one.
Mike Graham
Mike: Yes, I know. It was a subjective semi joke. (Most reasons for refcounting is that it's simpler than GCing -- so you can see how going with refcounts and ending up with a complementary GC can be amusing.)
Eli Barzilay
+1  A: 

The biggest gotcha I know with ref counting and the C API is the __del__ thing. When you have a borrowed reference to something, you think you can get away without INCREF'ing because you don't give up the GIL while you use that reference. But, if you end up deleting an object (by, for example, removing it from a list), it's possible that you trigger a __del__ call, which might remove the reference you're borrowing from under your feet. Very tricky.

If you INCREF (and then DECREF, of course) all borrowed references as soon as you get them, there shouldn't be any problem.

Virgil Dupras