I have a urllib2 caching module, which sporadically crashes because of the following code:
if not os.path.exists(self.cache_location):
os.mkdir(self.cache_location)
The problem is, by the time the second line is being executed, the folder may exist, and will error:
File ".../cache.py", line 103, in __init__
os.mkdir(self.cache_location)
OSError: [Errno 17] File exists: '/tmp/examplecachedir/'
This is because the script is simultaneously launched numerous times, by third-party code I have no control over.
The code (before I attempted to fix the bug) can be found here, on github
I can't use the tempfile.mkstemp, as it solves the race condition by using a randomly named directory (tempfile.py source here), which would defeat the purpose of the cache.
I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error), for example:
$ touch blah
$ python
>>> import os
>>> os.mkdir("blah")
Traceback (most recent call last):
File "", line 1, in
OSError: [Errno 17] File exists: 'blah'
>>>
I cannot using threading.RLock as the code is called from multiple processes.
So, I tried writing a simple file-based lock (that version can be found here), but this has a problem: it creates the lockfile one level up, so /tmp/example.lock for /tmp/example/, which breaks if you use /tmp/ as a cache dir (as it tries to make /tmp.lock)..
In short, I need to cache urllib2 responses to disc. To do this, I need to access a known directory (creating it, if required), in a multiprocess safe way. It needs to work on OS X, Linux and Windows.
Thoughts? The only alternative solution I can think of is to rewrite the cache module using SQLite3 storage, rather than files.