views:

396

answers:

2

This is only the second question with the parallel-python tag. After looking through the documentation and googling for the subject, I've come here as it's where I've had the best luck with answers and suggestions.

The following is the API (I think it's called) that submits all pertinent info to pp.

    def submit(self, func, args=(), depfuncs=(), modules=(),
        callback=None, callbackargs=(), group='default', globals=None):
    """Submits function to the execution queue

        func - function to be executed
        args - tuple with arguments of the 'func'
        depfuncs - tuple with functions which might be called from 'func'
        modules - tuple with module names to import
        callback - callback function which will be called with argument
                list equal to callbackargs+(result,)
                as soon as calculation is done
        callbackargs - additional arguments for callback function
        group - job group, is used when wait(group) is called to wait for
        jobs in a given group to finish
        globals - dictionary from which all modules, functions and classes
        will be imported, for instance: globals=globals()
    """

Here is my submit statement with its arguments:

job_server.submit(reify, (pop1, pop2, 1000), 
                  depfuncs = (key_seq, Chromosome, Params, Node, Tree), 
                  modules = ("math",), 
                  callback = sum.add, globals = globals())

All the capitalized names in depfuncs are the names of classes. I wasn't sure where to put the classes or even if I would need to include them as they are in the globals() dictionary. But when I ran it with the depfuncs() empty, it would raise an error such as "Tree not defined" (for example).

Now, key_seq is a generator, so I have to work with an instance of it in order to be able to use .next():

def key_seq():
    a = 0
    while True:
        yield a
        a = a + 1
ks = key_seq()

ks is defined in globals(). When I didn't include it anywhere else, I got an error saying 'ks is not defined'. When I include ks in depfuncs, this is the error:

Traceback (most recent call last):
  File "C:\Python26\Code\gppp.py", line 459, in <module>
    job_server.submit(reify, (pop1, pop2, 1000), depfuncs = (key_seq, ks, Chromosome, Params, Node, Tree), modules = ("math",), callback = sum.add, globals = globals())
  File "C:\Python26\lib\site-packages\pp.py", line 449, in submit
    sfunc = self.__dumpsfunc((func, ) + depfuncs, modules)
  File "C:\Python26\lib\site-packages\pp.py", line 634, in __dumpsfunc
    sources = [self.__get_source(func) for func in funcs]
  File "C:\Python26\lib\site-packages\pp.py", line 713, in __get_source
    sourcelines = inspect.getsourcelines(func)[0]
  File "C:\Python26\lib\inspect.py", line 678, in getsourcelines
    lines, lnum = findsource(object)
  File "C:\Python26\lib\inspect.py", line 519, in findsource
    file = getsourcefile(object) or getfile(object)
  File "C:\Python26\lib\inspect.py", line 441, in getsourcefile
    filename = getfile(object)
  File "C:\Python26\lib\inspect.py", line 418, in getfile
    raise TypeError('arg is not a module, class, method, '
TypeError: arg is not a module, class, method, function, traceback, frame, or code object

I'm pretty sure arg is referring to ks. So, where do I tell .submit() about ks? I don't understand what's supposed to go where. Thanks.

A: 

I think you should be passing in lambda:ks.next() instead of plain old ks

gnibbler
I tried changing ks to ks.next there was no change in the error message.
Peter Stewart
well well.. ks.next is a "method-wrapper" and both isfunction(ks.next) and ismethod(ks.next) return False in inpect.py
gnibbler
you can wrap ks.next with lambda to make inspect see it is a function.I modified my answer
gnibbler
So, I used "lambda:ks.next()" and got the message "NameError: global name 'ks' is not defined". It's a different message anyway.
Peter Stewart
how about `lambda ks=ks:ks.next()`? that should create a binding to ks inside the lambda
gnibbler
+3  A: 

interesting - are you doing genetics simulations? i ask because i see 'Chromosome' in there, and I once developed a population genetics simulation using parallel python.

your approach looks really complicated. in my parallel python program, i used the following call:

job = jobServer.submit( doRun, (param,))

how did i get away with this? the trick is that the doRun function doesn't run in the same context as the context in which you call sumbit. For instance (contrived example):

import os, pp

def doRun(param):
    print "your name is %s!" % os.getlogin()

jobServer = pp.Server()
jobServer.submit( doRun, (param,))

this code will fail. this is because the os module doesn't exist in doRun - doRun is not running in the same context as submit. sure, you can pass os in the module parameter of submit, but isn't it easier just to call import os in doRun ?

parallel python tries to avoid python's GIL by running your function in a totally separate process. it tries to make this easier to swallow by letting you quote-"pass" parameters and namespaces to your function, but it does this using hacks. for instance, your classes will be serialized using some variant of pickle and then unserialized in the new process.

But instead of relying on submit's hacks, just accept the reality that your function is going to need to do all the work of setting up it's run context. you really have two main functions - one that sets up the call to submit, and one, which you call via submit, which actually sets up the work you need to do.

if you need the next value from your generator to be available for a pp run, also pass it as a parameter! this avoids lambda functions and generator references, and leaves you with passing a simple variable!

my code is not maintained anymore, but if you're curious check it out here: http://pps-spud.uchicago.edu/viewvc/fps/trunk/python/fps.py?view=markup

Igor
not to mention that simply instantiating whatever objects you need in your doRun function will avoid the overhead of serializing and deserializing whatever parameters you want to pass via submit's arguments. uggh.
Igor
Yes, I'm doing a genetics simulation. I looked at your code and it is much simpler than mine.I'm new to this. I sort of understand when you say 'doRun isnot running in the same context as submit'. Does that mean it's running in a different namespac? I don't know how that is done.I've simplified my call to this:job_server.submit(reify, (pop1, pop2, 1000), callback = sum.add)I noticed you prefaced your call to submit with "job =". Is that meaningful in creating another context? In my next comment I'll include the function "reify" that is called by 'submit'. Thanks
Peter Stewart
This is the function that is called by 'submit':def reify(poi, poj, repitition): for i in range(repitition): poi.extend(generate(30)) poi.sort(fitnesscompare) poj[:] = [] poj.extend (poi[0:3]) poj.extend (poi[17:20]) poi = reproduce(poj) poi.sort(fitnesscompare) return poi"generate" and "reproduce" are functions that instantiate classes and call other functions. How do I call this in different context?thank you
Peter Stewart
I'm sorry. I didn't realize that the comments would leave the code in such an unreadable state. Should I start another queation?
Peter Stewart
yes, it's a different namespace. parallel python is a mechanism for getting true concurrency in python - avoiding the GIL by forking off a new process to run your code. whatever function you tell submit to run for you gets run in a totally EVERYTHING - namespace, memory, etc. so, you won't have generate and reproduce. i suggest putting those functions in a class and instantiating that class in reify
Igor
@Peter - i've basically made some suggestions on how to refactor your code. try to mess with it and see if you can get it to run now, and then start a new question if you're having trouble with your new code. it's probably good that you're new at this and that you're just getting started. it's easier to refactor now than after you've written most of the work.
Igor
Thanks igor. Your suggestions are what I came here for. I'll try refactoring.
Peter Stewart