Questions:
- What is the best practice for keeping track of a tread's progress without locking the GUI ("Not Responding")?
- Generally, what are the best practices for threading as it applies to GUI development?
Question Background:
- I have a PyQt GUI for Windows.
- It is used to process sets of HTML documents.
- It takes anywhere from three seconds to three hours to process a set of documents.
- I want to be able to process multiple sets at the same time.
- I don't want the GUI to lock.
- I'm looking at the threading module to achieve this.
- I am relatively new to threading.
- The GUI has one progress bar.
- I want it to display the progress of the selected thread.
- Display results of the selected thread if it's finished.
- I'm using Python 2.5.
My Idea: Have the threads emit a QtSignal when the progress is updated that triggers some function that updates the progress bar. Also signal when finished processing so results can be displayed.
#NOTE: this is example code for my idea, you do not have
# to read this to answer the question(s).
import threading
from PyQt4 import QtCore, QtGui
import re
import copy
class ProcessingThread(threading.Thread, QtCore.QObject):
__pyqtSignals__ = ( "progressUpdated(str)",
"resultsReady(str)")
def __init__(self, docs):
self.docs = docs
self.progress = 0 #int between 0 and 100
self.results = []
threading.Thread.__init__(self)
def getResults(self):
return copy.deepcopy(self.results)
def run(self):
num_docs = len(self.docs) - 1
for i, doc in enumerate(self.docs):
processed_doc = self.processDoc(doc)
self.results.append(processed_doc)
new_progress = int((float(i)/num_docs)*100)
#emit signal only if progress has changed
if self.progress != new_progress:
self.emit(QtCore.SIGNAL("progressUpdated(str)"), self.getName())
self.progress = new_progress
if self.progress == 100:
self.emit(QtCore.SIGNAL("resultsReady(str)"), self.getName())
def processDoc(self, doc):
''' this is tivial for shortness sake '''
return re.findall('<a [^>]*>.*?</a>', doc)
class GuiApp(QtGui.QMainWindow):
def __init__(self):
self.processing_threads = {} #{'thread_name': Thread(processing_thread)}
self.progress_object = {} #{'thread_name': int(thread_progress)}
self.results_object = {} #{'thread_name': []}
self.selected_thread = '' #'thread_name'
def processDocs(self, docs):
#create new thread
p_thread = ProcessingThread(docs)
thread_name = "example_thread_name"
p_thread.setName(thread_name)
p_thread.start()
#add thread to dict of threads
self.processing_threads[thread_name] = p_thread
#init progress_object for this thread
self.progress_object[thread_name] = p_thread.progress
#connect thread signals to GuiApp functions
QtCore.QObject.connect(p_thread, QtCore.SIGNAL('progressUpdated(str)'), self.updateProgressObject(thread_name))
QtCore.QObject.connect(p_thread, QtCore.SIGNAL('resultsReady(str)'), self.updateResultsObject(thread_name))
def updateProgressObject(self, thread_name):
#update progress_object for all threads
self.progress_object[thread_name] = self.processing_threads[thread_name].progress
#update progress bar for selected thread
if self.selected_thread == thread_name:
self.setProgressBar(self.progress_object[self.selected_thread])
def updateResultsObject(self, thread_name):
#update results_object for thread with results
self.results_object[thread_name] = self.processing_threads[thread_name].getResults()
#update results widget for selected thread
try:
self.setResultsWidget(self.results_object[thread_name])
except KeyError:
self.setResultsWidget(None)
Any commentary on this approach (e.g. drawbacks, pitfalls, praises, etc.) will be appreciated.
Resolution:
I ended up using the QThread class and associated signals and slots to communicate between threads. This is primarily because my program already uses Qt/PyQt4 for the GUI objects/widgets. This solution also required fewer changes to my existing code to implement.
Here is a link to an applicable Qt article that explains how Qt handles threads and signals, http://www.linuxjournal.com/article/9602. Excerpt below:
Fortunately, Qt permits signals and slots to be connected across threads—as long as the threads are running their own event loops. This is a much cleaner method of communication compared to sending and receiving events, because it avoids all the bookkeeping and intermediate QEvent-derived classes that become necessary in any nontrivial application. Communicating between threads now becomes a matter of connecting signals from one thread to the slots in another, and the mutexing and thread-safety issues of exchanging data between threads are handled by Qt.
Why is it necessary to run an event loop within each thread to which you want to connect signals? The reason has to do with the inter-thread communication mechanism used by Qt when connecting signals from one thread to the slot of another thread. When such a connection is made, it is referred to as a queued connection. When signals are emitted through a queued connection, the slot is invoked the next time the destination object's event loop is executed. If the slot had instead been invoked directly by a signal from another thread, that slot would execute in the same context as the calling thread. Normally, this is not what you want (and especially not what you want if you are using a database connection, as the database connection can be used only by the thread that created it). The queued connection properly dispatches the signal to the thread object and invokes its slot in its own context by piggy-backing on the event system. This is precisely what we want for inter-thread communication in which some of the threads are handling database connections. The Qt signal/slot mechanism is at root an implementation of the inter-thread event-passing scheme outlined above, but with a much cleaner and easier-to-use interface.
NOTE: eliben also has a good answer, and if I weren't using PyQt4, which handles thread-safety and mutexing, his solution would have been my choice.