views:

62

answers:

1

hi..

looking for some eyeballs to verifiy that the following chunk of psuedo python makes sense. i'm looking to spawn a number of threads to implement some inproc functions as fast as possible. the idea is to spawn the threads in the master loop, so the app will run the threads simultaneously in a parallel/concurrent manner

chunk of code
 -get the filenames from a dir
 -write each filename ot a queue
 -spawn a thread for each filename, where each thread 
  waits/reads value/data from the queue
 -the threadParse function then handles the actual processing 
  based on the file that's included via the "execfile" function...


# System modules
from Queue import Queue
from threading import Thread
import time

# Local modules
#import feedparser

# Set up some global variables
appqueue = Queue()

# more than the app will need
# this matches the number of files that will ever be in the 
# urldir
#
num_fetch_threads = 200


def threadParse(q)
  #decompose the packet to get the various elements
  line = q.get()
  college,level,packet=decompose (line)

  #build name of included file
  fname=college+"_"+level+"_Parse.py"
  execfile(fname)
  q.task_done()


#setup the master loop
while True
  time.sleep(2)
  # get the files from the dir
  # setup threads
  filelist="ls /urldir"
  if filelist
    foreach file_ in filelist:
        worker = Thread(target=threadParse, args=(appqueue,))
        worker.start()

    # again, get the files from the dir
    #setup the queue
    filelist="ls /urldir"
    foreach file_ in filelist:
       #stuff the filename in the queue
       appqueue.put(file_)


    # Now wait for the queue to be empty, indicating that we have
    # processed all of the downloads.

  #don't care about this part

  #print '*** Main thread waiting'
  #appqueue.join()
  #print '*** Done'

Thoughts/comments/pointers are appreciated...

thanks

A: 

If I understand this right: You spawn lots of threads to get things done faster.

This only works if the main part of the job done in each thread is done without holding the GIL. So if there is a lot of waiting for data from network, disk or something like that, it might be a good idea. If each of the tasks are using a lot of CPU, this will run pretty much like on a single core 1-CPU machine and you might as well do them in sequence.

I should add that what I wrote is true for CPython, but not necessarily for Jython/IronPython. Also, I should add that if you need to utilize more CPUs/cores, there's the multiprocessing module that might help.

Mattias Nilsson