I have a web server which saves cache files and keeps them for 7 days. The file names are md5 hashes, i.e. exactly 32 hex characters long, and are being kept in a tree structure that looks like this:
00/
00/
00000ae9355e59a3d8a314a5470753d8
.
.
00/
01/
You get the idea.
My problem is that deleting old files is taking a really long time. I have a daily cron job that runs
find cache/ -mtime +7 -type f -delete
which takes more than half a day to complete. I worry about scalability and the effect this has on the performance of the server. Additionally, the cache directory is now a black hole in my system, trapping the occasional innocent du
or find
.
The standard solution to LRU cache is some sort of a heap. Is there a way to scale this to the filesystem level? Is there some other way to implement this in a way which makes it easier to manage?
Here are ideas I considered:
- Create 7 top directories, one for each week day, and empty one directory every day. This increases the seek time for a cache file 7-fold, makes it really complicated when a file is overwritten, and I'm not sure what it will do to the deletion time.
- Save the files as blobs in a MySQL table with indexes on name and date. This seemed promising, but in practice it's always been much slower than FS. Maybe I'm not doing it right.
Any ideas?