views:

1309

answers:

5

Questions:

  1. What is the best practice for keeping track of a tread's progress without locking the GUI ("Not Responding")?
  2. Generally, what are the best practices for threading as it applies to GUI development?

Question Background:

  • I have a PyQt GUI for Windows.
  • It is used to process sets of HTML documents.
  • It takes anywhere from three seconds to three hours to process a set of documents.
  • I want to be able to process multiple sets at the same time.
  • I don't want the GUI to lock.
  • I'm looking at the threading module to achieve this.
  • I am relatively new to threading.
  • The GUI has one progress bar.
  • I want it to display the progress of the selected thread.
  • Display results of the selected thread if it's finished.
  • I'm using Python 2.5.

My Idea: Have the threads emit a QtSignal when the progress is updated that triggers some function that updates the progress bar. Also signal when finished processing so results can be displayed.

#NOTE: this is example code for my idea, you do not have
#      to read this to answer the question(s).

import threading
from PyQt4 import QtCore, QtGui
import re
import copy

class ProcessingThread(threading.Thread, QtCore.QObject):

    __pyqtSignals__ = ( "progressUpdated(str)",
                        "resultsReady(str)")

    def __init__(self, docs):
        self.docs = docs
        self.progress = 0   #int between 0 and 100
        self.results = []
        threading.Thread.__init__(self)

    def getResults(self):
        return copy.deepcopy(self.results)

    def run(self):
        num_docs = len(self.docs) - 1
        for i, doc in enumerate(self.docs):
            processed_doc = self.processDoc(doc)
            self.results.append(processed_doc)
            new_progress = int((float(i)/num_docs)*100)

            #emit signal only if progress has changed
            if self.progress != new_progress:
                self.emit(QtCore.SIGNAL("progressUpdated(str)"), self.getName())
            self.progress = new_progress
            if self.progress == 100:
                self.emit(QtCore.SIGNAL("resultsReady(str)"), self.getName())

    def processDoc(self, doc):
        ''' this is tivial for shortness sake '''
        return re.findall('<a [^>]*>.*?</a>', doc)


class GuiApp(QtGui.QMainWindow):

    def __init__(self):
        self.processing_threads = {}  #{'thread_name': Thread(processing_thread)}
        self.progress_object = {}     #{'thread_name': int(thread_progress)}
        self.results_object = {}      #{'thread_name': []}
        self.selected_thread = ''     #'thread_name'

    def processDocs(self, docs):
        #create new thread
        p_thread = ProcessingThread(docs)
        thread_name = "example_thread_name"
        p_thread.setName(thread_name)
        p_thread.start()

        #add thread to dict of threads
        self.processing_threads[thread_name] = p_thread

        #init progress_object for this thread
        self.progress_object[thread_name] = p_thread.progress  

        #connect thread signals to GuiApp functions
        QtCore.QObject.connect(p_thread, QtCore.SIGNAL('progressUpdated(str)'), self.updateProgressObject(thread_name))
        QtCore.QObject.connect(p_thread, QtCore.SIGNAL('resultsReady(str)'), self.updateResultsObject(thread_name))

    def updateProgressObject(self, thread_name):
        #update progress_object for all threads
        self.progress_object[thread_name] = self.processing_threads[thread_name].progress

        #update progress bar for selected thread
        if self.selected_thread == thread_name:
            self.setProgressBar(self.progress_object[self.selected_thread])

    def updateResultsObject(self, thread_name):
        #update results_object for thread with results
        self.results_object[thread_name] = self.processing_threads[thread_name].getResults()

        #update results widget for selected thread
        try:
            self.setResultsWidget(self.results_object[thread_name])
        except KeyError:
            self.setResultsWidget(None)

Any commentary on this approach (e.g. drawbacks, pitfalls, praises, etc.) will be appreciated.

Resolution:

I ended up using the QThread class and associated signals and slots to communicate between threads. This is primarily because my program already uses Qt/PyQt4 for the GUI objects/widgets. This solution also required fewer changes to my existing code to implement.

Here is a link to an applicable Qt article that explains how Qt handles threads and signals, http://www.linuxjournal.com/article/9602. Excerpt below:

Fortunately, Qt permits signals and slots to be connected across threads—as long as the threads are running their own event loops. This is a much cleaner method of communication compared to sending and receiving events, because it avoids all the bookkeeping and intermediate QEvent-derived classes that become necessary in any nontrivial application. Communicating between threads now becomes a matter of connecting signals from one thread to the slots in another, and the mutexing and thread-safety issues of exchanging data between threads are handled by Qt.

Why is it necessary to run an event loop within each thread to which you want to connect signals? The reason has to do with the inter-thread communication mechanism used by Qt when connecting signals from one thread to the slot of another thread. When such a connection is made, it is referred to as a queued connection. When signals are emitted through a queued connection, the slot is invoked the next time the destination object's event loop is executed. If the slot had instead been invoked directly by a signal from another thread, that slot would execute in the same context as the calling thread. Normally, this is not what you want (and especially not what you want if you are using a database connection, as the database connection can be used only by the thread that created it). The queued connection properly dispatches the signal to the thread object and invokes its slot in its own context by piggy-backing on the event system. This is precisely what we want for inter-thread communication in which some of the threads are handling database connections. The Qt signal/slot mechanism is at root an implementation of the inter-thread event-passing scheme outlined above, but with a much cleaner and easier-to-use interface.

NOTE: eliben also has a good answer, and if I weren't using PyQt4, which handles thread-safety and mutexing, his solution would have been my choice.

A: 

You are always going to have this problem in Python. Google GIL "global interpretor lock" for more background. There are two generally recommended ways to get around the problem that you are experiencing: use Twisted, or use a module similar to the multiprocessing module introduced in 2.5.

Twisted will require that you learn asynchronous programming techniques which may be confusing in the beginning but will be helpful if you ever need to write high throughput network apps and will be more beneficial to you in the long run.

The multiprocessing module will fork a new process and uses IPC to make it behave as if you had true threading. Only downside is that you would need python 2.5 installed which is fairly new and inst' included in most Linux distros or OSX by default.

MrEvil
I believe the 'multiprocessing' module was introduced in 2.6. Also, I'm writing a standalone Windows app, so Twisted doesn't really apply. Thanks for the ideas though.
tgray
+1  A: 

If your method "processDoc" doesn't change any other data (just looks for some data and return it and don't change variables or properties of parent class) you may use Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macroses ( see here for details ) in it. So the document will be processed in thread which will not lock the interpreter and UI will be updated.

zihotki
+4  A: 

I recommend you to use Queue instead of signaling. Personally I find it a much more robust and understandable way of programming, because it's more synchronous.

Threads should get "jobs" from a Queue, and put back results on another Queue. Yet a third Queue can be used by the threads for notifications and messages, like errors and "progress reports". Once you structure your code this way, it becomes much simpler to manage.

This way, a single "job Queue" and "result Queue" can also be used by a group of worker threads, it routes all the information from the threads into the main GUI thread.

Eli Bendersky
Could you supply an example? I'm a little confused about how the threads communicate with the Queue objects.
tgray
see here for example: http://code.activestate.com/recipes/302746/
Eli Bendersky
+5  A: 

If you want to use signals to indicate progress to the main thread then you should really be using PyQt's QThread class instead of the Thread class from Python's threading module.

A simple example which uses QThread, signals and slots can be found on the PyQt Wiki:

http://www.diotavelli.net/PyQtWiki/Threading%2C_Signals_and_Slots

David Boddie
+1  A: 

Native python queues won't work because you have to block on queue get(), which bungs up your UI.

Qt essentially implements a queuing system on the inside for cross thread communication. Try this call from any thread to post a call to a slot.

QtCore.QMetaObject.invokeMethod()

It's clunky and is poorly documented, but it should do what you want even from from a non-Qt thread.

You can also use event machinery for this. See QApplication (or QCoreApplication) for a method named something like "post".

Edit: Here's a more complete example...

I created my own class based on QWidget. It has a slot that accepts a string; I define it like this:

@QtCore.pyqtSlot(str)
def add_text(self, text):
   ...

Later, I create an instance of this widget in the main GUI thread. From the main GUI thread or any other thread (knock on wood) I can call:

QtCore.QMetaObject.invokeMethod(mywidget, "add_text", QtCore.Q_ARG(str,"hello world"))

Clunky, but it gets you there.

Dan.