ansaurus

Question

Answer 1

+1 A:

How about having a table in your database that uses the hash as the key. The other field would then be the name of the file. That way the file can be stored in a date-related fashion for fast deletion, and the database can be used for finding that file's location based on the hash in a fast fashion.

David Arno 2008-11-03 09:35:37

Answer 2

+1 A:

Reiserfs is relatively efficient at handling small files. Did you try different Linux file systems? I'm not sure about delete performance - you can consider formatting (mkfs) as a substitute for individual file deletion. For example, you can create a different file system (cache1, cache2, ...) for each weekday.

gimel 2008-11-03 09:42:22

Answer 3

+13 A:

When you store a file, make a symbolic link to a second directory structure that is organized by date, not by name.

Retrieve your files using the "name" structure, delete them using the "date" structure.

Tomalak 2008-11-03 09:45:18

Bugger :) You beat me to it. +1 this answer.

OJ 2008-11-03 09:48:01

Just be sure to remove both the original file and the link. You don't want lots of dead links there, and it's also easy to just remove the link and not remove the original file.

Ben Combee 2008-11-03 22:23:55

Answer 4

+1 A:

How about this:

Have another folder called, say, "ToDelete"
When you add a new item, get today's date and look for a subfolder in "ToDelete" that has a name indicative of the current date
If it's not there, create it
Add a symbolic link to the item you've created in today's folder
Create a cron job that goes to the folder in "ToDelete" which is of the correct date and delete all the folders that are linked.
Delete the folder which contained all the links.

OJ 2008-11-03 09:47:30

Answer 5

+4 A:

Assuming this is ext2/3 have you tried adding in the indexed directories? When you have a large number of files in any particular directory the lookup will be painfully slow to delete something.
use tune2fs -o dir_index to enable the dir_index option.
When mounting a file system, make sure to use noatime option, which stops the OS from updating access time information for the directories (still needs to modify them).
Looking at the original post it seems as though you only have 2 levels of indirection to the files, which means that you can have a huge number of files in the leaf directories. When there are more than a million entries in these you will find that searches and changes are terribly slow. An alternative is to use a deeper hierarchy of directories, reducing the number of items in any particular directory, therefore reducing the cost of search and updates to the particular individual directory.

Petesh 2008-11-03 10:13:24

ansaurus

tags:

views:

answers:

handling lots of temporary small files

related questions