views:

111

answers:

1

I'm trying to run some python code under Apache 2.2 / mod_python 3.2.8. Eventually the code does os.fork() and spawns 2 separate long-run processes. Each of those processes has to create a separate instance of a class in order to avoid any possible collision in the parallel flow.

class Foo(object):
   pass

kidprocs = []

for kid in ('kid1', 'kid2'):

  pid = os.fork()
  if pid:
    # parent
    kidprocs.append(pid)
    time.sleep(5)
  else:
    # child

    fooobj = Foo() 
    print "Starting %s in sub-process %s" % (kid, os.getpid())
    print "Kid fooobj: %s" % repr(fooobj) 
    os._exit(0)

for kidproc in kidprocs:
  os.waitpid(kidproc, 0)

Those print outputs look like this:

Starting kid1 in sub-process 20906
    foo obj: <__main__.Foo instance at 0xb7da5fec>

Starting kid2 in sub-process 20909
    foo obj: <__main__.Foo instance at 0xb7da5fec>

As you can see I got the same object for both sub-processes. Do you have an idea why it's going like this under mod_python and is there a way to get separate instances anyway? Thanks a lot.

+3  A: 

The memory location given by the repr() function is an address in virtual memory, not an address in the system's global memory. Each of your processes returned by fork() has its own virtual memory space which is completely distinct from other processes. They do not share memory.

Edit: Per brian's comments below, technically they do share memory until the kernel decides to segregate them (when a child writes to a portion of shared memory). The behavior, though, is effectively the same.

The structure of your programs is the same, so python uses the same virtual memory location in each processes' distinct virtual memory store for each of your identical objects for each child.

If you actually modify the content of the objects and test them, you will see that even though the memory location looks the same, the two are completely distinct objects, because they belong to two distinct processes. In reality you can't modify one from the other (without some kind of interprocess communication to mediate).

Adam Bellaire
Tiny nitpick: They do actually share memory, as fork() is implemented just by marking common pages Copy On Write, rather than copying them immediately. Seperate memory will only be used after modifying the data. However this is completely transparent, so the *effective* behaviour is as you describe.
Brian
@Brian: Hmm, interesting. However, in my tests, after the value is modified the memory location does change, but it changes to the same value. That is, repr() still shows the same address as the other child even after the each child sets a different value.
Adam Bellaire
It'll still be the same virtual address - it's all just a behind the scenes optimisation. The kernel marks the page as read-only, and when a write causes a fault, it copies the physical memory, redirects the virtual address to this new physical location and then proceeds with the app none the wiser.
Brian
The only difference will be the unshared memory usage of the app, which won't increase till the first modification, and a slight delay on the first write (which would otherwise have occurred when the fork() was performed). The payoff is that the kernel never has to copy pages that aren't touched.
Brian
Cool, thanks for the thorough explanation. This is one of the reasons I like StackOverflow. :)
Adam Bellaire