views:

47

answers:

1

I am trying to automate some big data file processing using python.

A lop of the processing is chained , i.e script1 writes a file , that is then processed by script2 , then script2's output by script3 etc.

I am using the subprocess module in a threaded context.

I have one class that creates tuples of chained scripts ("scr1.sh","scr2.sh","scr3.sh").

Then another class that uses a call like

for script in scriplist:
    subprocess.call(script)

My question is that in this for loop , is each script only called after subprocess.call(script1) returns a successful retcode?.

Or is it that all three get called right after one another since I am using subprocess.call, Without using "sleep" or "wait", I want to make sure that the second script only starts after the first one is over.

edit: The pydoc says "subprocess.call(*popenargs, **kwargs) Run command with arguments. Wait for command to complete, then return the returncode attribute."

So in the for loop (above) , does it wait for each retcode before iterating to the next script.

I am new to threading . I am attaching the stripped-down code for the class that runs the analysis here. The subprocess.call loop is part of this class.

class ThreadedDataProcessor(Thread):
            def __init__(self, in_queue, out_queue):
                # Uses Queue 
                Thread.__init__(self)
                self.in_queue = in_queue
                self.out_queue = out_queue
            def run(self):
                while True:
                    path = self.in_queue.get()
                    if path is None:
                        break
                    myprocessor = ProcessorScriptCreator(path)
                    scrfiles = myprocessor.create_and_return_shell_scripts()

                for index,file in enumerate(scrfiles):
                    subprocess.call([file])
                    print "CALLED%s%s" % (index,file) *5
                #report(myfile.describe())
                #report("Done %s" %  path)
                self.out_queue.put(path) 
                in_queue = Queue()
+1  A: 

The loop will serially call each script, wait until it completes, and then call the next one regardless of success or failure of the previous call. You probably want to say:

try:
  map(subprocess.check_call, script_list)
except Exception, e:
  # failed script

A new thread will start with each call to run, and also end when run is done. You iterate over the script with subprocess within one thread.

You should make sure that each set of calls in each thread are not going to impact other calls from other threads. For example trying to read and write to the same file from a script call in multiple threads at the same time.

dietbuddha