I am trying to automate some big data file processing using python.
A lop of the processing is chained , i.e script1 writes a file , that is then processed by script2 , then script2's output by script3 etc.
I am using the subprocess module in a threaded context.
I have one class that creates tuples of chained scripts ("scr1.sh","scr2.sh","scr3.sh").
Then another class that uses a call like
for script in scriplist:
subprocess.call(script)
My question is that in this for loop , is each script only called after subprocess.call(script1) returns a successful retcode?.
Or is it that all three get called right after one another since I am using subprocess.call, Without using "sleep" or "wait", I want to make sure that the second script only starts after the first one is over.
edit: The pydoc says "subprocess.call(*popenargs, **kwargs) Run command with arguments. Wait for command to complete, then return the returncode attribute."
So in the for loop (above) , does it wait for each retcode before iterating to the next script.
I am new to threading . I am attaching the stripped-down code for the class that runs the analysis here. The subprocess.call loop is part of this class.
class ThreadedDataProcessor(Thread):
def __init__(self, in_queue, out_queue):
# Uses Queue
Thread.__init__(self)
self.in_queue = in_queue
self.out_queue = out_queue
def run(self):
while True:
path = self.in_queue.get()
if path is None:
break
myprocessor = ProcessorScriptCreator(path)
scrfiles = myprocessor.create_and_return_shell_scripts()
for index,file in enumerate(scrfiles):
subprocess.call([file])
print "CALLED%s%s" % (index,file) *5
#report(myfile.describe())
#report("Done %s" % path)
self.out_queue.put(path)
in_queue = Queue()