views:

79

answers:

3

The tags might not be accurate since I am not sure where the problem is.

I have a module where I am trying to read some data from a socket, and write the results into a file (append) It looks something like this, (only relevant parts included)

if __name__ == "__main__":
    <some init code>
    for line in file:
        t = Thread(target=foo, args=(line,))
        t.start()
    while nThreads > 0: 
        time.sleep(1)

Here are the other modules,

def foo(text):
    global countLock, nThreads
    countLock.acquire()
    nThreads += 1
    countLock.release()

    """connect to socket, send data, read response"""
    writeResults(text, result)

    countLock.acquire()
    nThreads -= 1
    countLock.release()

def writeResults(text, result):
    """acquire file lock"""
    """append to file"""
    """release file lock"""

Now here's the problem. Initially, I had a typo in the function 'foo', where I was passing the variable 'line' to writeResults instead of 'text'. 'line' is not defined in the function foo, it's defined in the main block, so I should have seen an error, but instead, it worked fine, except that the data was appended to the file multiple times, instead of being written just once, which is the required behavior, which I got when I fixed the typo.

My question is,

1) Why didn't I get an error?

2) Why was the writeResults function being called multiple times?

+1  A: 

You really ought to use Thread.join() for your main loop to wait for a thread to finish:

if __name__ == "__main__":
    <some init code>
    threads = []
    for line in file:
        t = Thread(target=foo, args=(line,))
        t.start()
        threads.append(t)
    for t in threads:
        t.join()

and get rid of the nThreads global and countLock.

As for your questions, I don't know why you didn't get an error (maybe the exception got eaten by something?), and I wonder if the number of repetitions of writeResults correlated with the number of lines in the file. If it did, I would have to wonder if line is a global and each thread wrote it once.

Mike DeSimone
Thanks for your reply. No the number of repetitions (and I should have mentioned this before) was 4 or 5 most of the times.
fsm
+1  A: 

When you had (simplifying to the important part)

def foo(text):
    writeResults(line, result)

foo, not having a local variable line, was using the global variable by that name... which happens to be the one set (in the main thread) by for line in file:.

Specifically, I would expect the total number of lines written to be OK: there's one thread per line (a weird architecture, BTW) and each thread writes one line... the only issue is, which line each thread writes.

In your intention, the first thread writes the first line, the second thread writes the second line, etc; but in reality each thread will write the line that happens to be bound to the global line name at the crucial moment when the thread calls writeResults. So, some lines may well end up written multiple times, others, not written.

For example, suppose the main thread ran faster by enough to start all subthreads before any of them actually gets to writing. In that case, the last value taken by the global name line (i.e., the last line in the file) would be the one written by all of the threads.

Note that even in the "corrected" version there is no guarantee about the order in which the various lines get written, which is part of what makes this architecture weird -- normally, since lines come in a certain order, you'd want to preserve that order on output. I guess your application case is peculiar enough to not need that constraint, but I'm still puzzled that you need so many threads when you're reading from one file and writing to one file!-)

Alex Martelli
+1  A: 

To avoid the global variable called line, you should write a function called main that does the job. Then the variable is local to the main function.

Make your program look like this:

def main(args):
    # ... init ...
    for line in file:
        # ... process the line
        pass

if __name__ == "__main__":
    main(sys.argv[1:])
Roland Illig