views:

235

answers:

2

I get the following error when using multiprocessing:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 477, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.6/multiprocessing/pool.py", line 282, in _handle_results
    task = get()
UnpicklingError: NEWOBJ class argument has NULL tp_new

I have absolutely no idea what this means, although it sounds like something's wrong at the C level. Can anyone shed some light on this?

UPDATE: Ok, so I figured out how to fix this. But I'm still a bit perplexed. I'm returning an instance of this class:

class SpecData(object):
    def __init__(self, **kwargs):
        self.__dict__.update(**kwargs)
    def to_dict(self):
        return self.__dict__

If I return an instance of this object, I get the error. However, if I call to_dict and return a dictionary, it works. What am I doing wrong?

A: 

I've done thread-safety in C++, Java, and Delphi, but not Python, so take my comments with a grain of salt.

This page on Python and Thread-Safety specifically mentions the assignment of a dictionary to be atomic and thread-safe. Perhaps your reference to your custom class is not thread-safe? Try adding some of the recommended locking mechanisms if you would still rather pass a custom container class between two threads.

I find it fascinating that other search results state emphatically that Python is completely thread-safe. The Python docs themself state that locks and other mechanisms are provided to help with threadded applications, so looks like it's the case of the internets being wrong (does that even happen??).

Another StackOverflow question on python and thread-safety.

Kieveli
I'm not using threads actually. The multiprocessing module works by starting separate processes. Then, the processes communicate by serializing objects and passing them through a queue. I'm getting an error when that object is being deserialized.
Jason Baker
@Kieveli, with respect to Python being "completely thread-safe", the point is that access to primitives and other internal operations that are done by the virtual machine are safe. Things like dictionaries cannot get into invalid states, crash the machine, etc, because of multithreaded access. That's quite different from saying that your application is immune to thread synchronization issues. Race conditions and deadlocks are always possible in poorly designed apps and need to be addressed in the usual manner. Some facilities (e.g. the Queue module) help a lot even at the app level though.
Peter Hansen
+1  A: 

Try using the pickle module rather than the cPickle module -- pickle is written in pure Python, and often it gives more useful error messages than cPickle. (Though sometimes I've had to resort to making a local copy of pickle.py, and adding in a few debug printf statements near the location of the error to figure out the problem.)

Once you track down the problem, you can switch back to cpickle.

(I'm not that familiar with the multiprocessing module, so I'm not sure whether you're doing the pickling or it is. If it is, then the easiest way to get it to use pickle rather than cpickle may be to do some monkey-patching before you import the multiprocessing/threading module: import sys, pickle; sys.modules['cPickle']=pickle)

Edward Loper