views:

253

answers:

5

I'm having an interesting problem with threads and the tempfile module in Python. Something doesn't appear to be getting cleaned up until the threads exit, and I'm running against an open file limit. (This is on OS X 10.5.8, Python 2.5.1.)

Yet if I sort of replicate what the tempfile module is doing (not all the security checks, but just generating a file descriptor and then using os.fdopen to produce a file object) I have no problems.

Before filing this as a bug with Python, I figured I'd check here, as it's much more likely that I'm doing something subtly wrong. But if I am, a day of trying to figure it out hasn't gotten me anywhere.

#!/usr/bin/python

import threading
import thread
import tempfile
import os
import time
import sys

NUM_THREADS = 10000

def worker_tempfile():
    tempfd, tempfn = tempfile.mkstemp()
    tempobj = os.fdopen(tempfd, 'wb')
    tempobj.write('hello, world')
    tempobj.close()
    os.remove(tempfn)
    time.sleep(10)

def worker_notempfile(index):
    tempfn = str(index) + '.txt'
    # The values I'm passing os.open may be different than tempfile.mkstemp 
    # uses, but it works this way as does using the open() function to create
    # a file object directly.
    tempfd = os.open(tempfn, 
                     os.O_EXCL | os.O_CREAT | os.O_TRUNC | os.O_RDWR)
    tempobj = os.fdopen(tempfd, 'wb')
    tempobj.write('hello, world')
    tempobj.close()
    os.remove(tempfn)
    time.sleep(10)

def main():
    for count in range(NUM_THREADS):
        if count % 100 == 0:
            print('Opening thread %s' % count)
        wthread = threading.Thread(target=worker_tempfile)
        #wthread = threading.Thread(target=worker_notempfile, args=(count,))
        started = False
        while not started:
            try:
                wthread.start()
                started = True
            except thread.error:
                print('failed starting thread %s; sleeping' % count)
                time.sleep(3)

if __name__ == '__main__':
    main()

If I run it with the worker_notempfile line active and the worker_tempfile line commented-out, it runs to completion.

The other way around (using worker_tempfile) I get the following error:

$ python threadtempfiletest.py 
Opening thread 0
Opening thread 100
Opening thread 200
Opening thread 300
Exception in thread Thread-301:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/threading.py", line 460, in __bootstrap
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/threading.py", line 440, in run
  File "threadtempfiletest.py", line 17, in worker_tempfile
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tempfile.py", line 302, in mkstemp
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/tempfile.py", line 236, in _mkstemp_inner
OSError: [Errno 24] Too many open files: '/var/folders/4L/4LtD6bCvEoipksvnAcJ2Ok+++Tk/-Tmp-/tmpJ6wjV0'

Any ideas what I'm doing wrong? Is this a bug in Python, or am I being bone-headed?

UPDATE 2009-12-14: I think I've found the answer, but I don't like it. Since nobody was able to replicate the problem, I went hunting around our office for machines. It passed on everything except my machine. I tested on a Mac with the same software versions I was using. I even went hunting for a Desktop G5 with the EXACT same hardware and software config I had -- same result. Both tests (with tempfile and without tempfile) succeeded on everything.

For kicks, I downloaded Python 2.6.4, and tried it on my desktop, and same pattern on my system as Python 2.5.1: tempfile failed, and notempfile succeeded.

This is leading me to the conclusion that something's hosed on my Mac, but I sure can't figure out what. Any suggestions are welcome.

+2  A: 

I think your answer can be found here. You have to explicitly os.close() the file descriptor given as the first part of the tuple that mkstemp gives you.

Edit: no, the OP is already doing what is supposed to be done. I'm leaving the answer up for the nice link.

Jonathan Feinberg
But that post says "The function os.fdopen(fd) will return a Python file object using the same file descriptor. Closing that file object will close the OS-level file descriptor" -- which is (or should be to the best of my knowledge) correct and is why the OP's bug so mysterious... he **is** using `fdopen` and then closing the file object... and yet he's leaking file descriptors anyway, which is a serious mystery!
Alex Martelli
D'oh! Thanks for the correction. I'll leave this answer up, just because the resource it links to is useful.
Jonathan Feinberg
+4  A: 

I am unable to reproduce the problem with (Apple's own build of) Python 2.5.1 on Mac OS X 10.5.9 -- runs to completion just fine!

I've tried both on a Macbook Pro, i.e., an Intel processor, and an old PowerMac, i.e., a PPC processor.

So I can only imagine there must have been a bug in 10.5.8 which I never noticed (don't have any 10.5.8 around to test, as I always upgrade promptly whenever software update offers it). All I can suggest is that you try upgrading to 10.5.9 and see if the bug disappears -- if it doesn't, I have no idea how this behavior difference between my machines and yours is possible.

Alex Martelli
Hmm. 10.5.8 appears to be the latest version software update will give me. Perhaps this is a PowerPC vs Intel thing? (I'm on PowerPC.)
Schof
Does not fail for me on 10.5.8 PPC with the Apple 2.5.1.
Ned Deily
10.5.8 *is* the latest version listed on Apple's website. Is 10.5.9 a pre-release version?
Schof
@Schof: my bad, I do indeed have 10.5.8 and must have misread the "About This Mac" info window. So the code failing for you is now a deep mystery, as identical code on just about identical HW and SW is working just fine for Ned and me (I did test on PPC as well, as I mentioned).
Alex Martelli
+1  A: 

I just tested your code on my Ubuntu Linux computer here, and it worked perfectly for me.

I have one suggestion for you to try. I don't know that it will help but it can't hurt. Rewrite your code to use with:

from __future__ import with_statement

def worker_tempfile():
    tempfd, tempfn = tempfile.mkstemp()
    with os.fdopen(tempfd, 'wb') as tempobj:
        tempobj.write('hello, world')
    os.remove(tempfn)
    time.sleep(10)

The with statement is supposed to make sure that the file object gets closed no matter what. Perhaps it might help?

Good luck. Great job on the question, by the way.

steveha
A: 

Why do you think the error is not genuine? You are launching 10000 threads, each opening a file, while the maximum number of open files is typically 1024 under Unix systems.

First try to keep manually track of the number of files currently open and check whether it bumps past the OS limit.

Antoine P.
The reason I think this may be a Python bug (or an error in my code) is because one function fails (worker_tempfile) while a roughly equivalent function succeeds (worker_notempfile).
Schof
It's a very weak reason. Calling different functions (the builtin os.* functions or the Python-written tempfile.* functions) with different implementations can have a lot of impact on how things get parallelized. That's why I suggest you to check that the error is not genuine.
Antoine P.
A: 

Since nobody was able to replicate the problem, I went hunting around our office for machines. It passed on everything except my machine. I tested on a Mac with the same software versions I was using. I even went hunting for a Desktop G5 with the EXACT same hardware and software config I had -- same result. Both tests (with tempfile and without tempfile) succeeded on everything.

For kicks, I downloaded Python 2.6.4, and tried it on my desktop, and same pattern on my system as Python 2.5.1: tempfile failed, and notempfile succeeded.

This is leading me to the conclusion that something's hosed on my Mac, so this isn't likely to be a problem that anyone else runs into ever.

Thanks VERY much to everyone (especially Alex Martelli) who helped on this!

Schof