views:

294

answers:

4

I have a urllib2 caching module, which sporadically crashes because of the following code:

if not os.path.exists(self.cache_location):
    os.mkdir(self.cache_location)

The problem is, by the time the second line is being executed, the folder may exist, and will error:

  File ".../cache.py", line 103, in __init__
    os.mkdir(self.cache_location)
OSError: [Errno 17] File exists: '/tmp/examplecachedir/'

This is because the script is simultaneously launched numerous times, by third-party code I have no control over.

The code (before I attempted to fix the bug) can be found here, on github

I can't use the tempfile.mkstemp, as it solves the race condition by using a randomly named directory (tempfile.py source here), which would defeat the purpose of the cache.

I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error), for example:

$ touch blah
$ python
>>> import os
>>> os.mkdir("blah")
Traceback (most recent call last):
  File "", line 1, in 
OSError: [Errno 17] File exists: 'blah'
>>>

I cannot using threading.RLock as the code is called from multiple processes.

So, I tried writing a simple file-based lock (that version can be found here), but this has a problem: it creates the lockfile one level up, so /tmp/example.lock for /tmp/example/, which breaks if you use /tmp/ as a cache dir (as it tries to make /tmp.lock)..

In short, I need to cache urllib2 responses to disc. To do this, I need to access a known directory (creating it, if required), in a multiprocess safe way. It needs to work on OS X, Linux and Windows.

Thoughts? The only alternative solution I can think of is to rewrite the cache module using SQLite3 storage, rather than files.

+2  A: 

Could you catch the exception and then test whether the file exists as a directory or not?

pythonic metaphor
Possibly! I thought of that just as I was re-reading the question, before submitting it.. I've implemented this ( http://github.com/dbr/tvdb_api/blob/468d9f816373b14ef3a483fca07e031b69fa62f9/cache.py#L103-114 ), and will get the person who reported the bug to test it shortly.
dbr
This seems to work perfectly, thanks!
dbr
@dbr: note that on line 114, you want `raise e` since it's already an instance of `OSError` . http://github.com/dbr/tvdb_api/blob/468d9f816373b14ef3a483fca07e031b69fa62f9/cache.py#L114
nosklo
+7  A: 

Instead of

if not os.path.exists(self.cache_location):
    os.mkdir(self.cache_location)

you could do

try:
    os.makedirs(self.cache_location)
except OSError:
    pass

As you would end up with the same functionality.

DISCLAIMER: I don't know how Pythonic this might be.


Using SQLite3, might be a bit of overkill, but would add a lot of functionality and flexibility to your use case.

If you have to do a lot of "selecting", concurrent inserting and filtering, it's a great idea to use SQLite3, as it wont add too much complexity over simple files (it could be argued that it removes complexity).


Rereading your question (and comments) I can better understand your problem.

What is the possibility that a file could create the same race condition?

If it is small enough, then I'd do something like:

if not os.path.isfile(self.cache_location):
    try:
        os.makedirs(self.cache_location)
    except OSError:
        pass

Also, reading your code, I'd change

else:
    # Our target dir is already a file, or different error,
    # relay the error!
    raise OSError(e)

to

else:
    # Our target dir is already a file, or different error,
    # relay the error!
    raise

as it's really what you want, Python to reraise the exact same exception (just nitpicking).


One more thing, may be this could be of use for you (Unix-like only).

voyager
+1: It's perfectly Pythonic, IMO.
S.Lott
Agreed, though you might consider using `os.makedirs` instead, which create parent directories as well, if necessary.
Eli Courtwright
@Eli Courtwright: good point. Modified.
voyager
For many cases, ignoring the error would be Pythonic, but in this case it could cause other problems, as I addressed in the question - "I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error)" - discarding the exception would mean the temp directory either doesn't exist (say, a permission denied error), or is a useless file (and would cause errors further down the path)
dbr
Also, I agree with your points about SQLite - I wouldn't have to do much in the way of filtering (pretty much just "has this URL been requested in the last x hours"), but SQLite would remove a lot of the complexity (no constructing cached-file paths and checking file modification times), and I think it should deal with all the locking issues (both for specific bits of cached data, and the sqlite file itself).. Hm... If Pythonic Metaphor's solution doesn't work, I may start this (in fact I may regardless, a nice SQLite backed urllib2 cache module would be useful)
dbr
@dbr: If you want to distinguish between "directory already exists as directory" and "directory name already exists as file", you do that in the except clause. Don't make this too complex. Simply handle the exception.
S.Lott
Thanks for the ideas. Catching OSError, checking if the dir exists and if so ignoring the error seems to work fine, and will still correctly error if the user has insufficient permissions. I've made the change you suggested to the code, I completely forgot `raise` will re-raise the caught exception.
dbr
+2  A: 

When you have race conditions often EAFP(easier to ask forgiveness than permission) works better that LBYL(look before you leap)

Error checking strategies

gnibbler
A: 

This might be also helpful http://stackoverflow.com/questions/600268/mkdir-p-functionality-in-python

Adam Byrtek