views:

138

answers:

2

I'm in the process of implementing caching for my project, after looking at cache directory structures, I've seen many examples like:

  • cache
  • cache/a
  • cache/a/a/
  • cache/a/...
  • cache/a/z
  • cache/...
  • cache/z ...

You get the idea. Another example for storing files, let's say our file is named IMG_PARTY.JPG, a common way is to put it in a directory named:

files/i/m/IMG_PARTY.JPG

Some thoughts come to mind, but I'd like to know the real reasons for this.

  • Filesystems doing linear lookups find files faster when there's less of them in a directory. Such structure spreads files thin. (wild guess)

  • To not mess up *nix utilities like rm, which take a finite number of arguments and deleting large number of files at once tends to be hacky (having to pass it though find etc.)

What's the real reason? Can you suggest a cache directory structure that you find to be good and tell me why it's good?

Thanks!

+2  A: 

Every time I've done it, it has been to avoid slow linear searches in filesystems. Luckily, at least on Linux, this is becoming a thing of the past.

However, even today, with b-tree based directories, a very large directory will be hard to deal with, since it will take forever and a day just to get a listing of all the files, never mind finding the right file.

Lars Wirzenius
Ah, thought it had something to do with it. Would love to know which FS`s still use linear search. I'll wait for more answers before selecting one as accepted, thanks!
Karolis
On Linux, ext2 and ext3 use linear search, unless the dir_index option is enabled for the filesystem (it's been the default for a while now). In general, old filesystems use linear, new ones use trees.
Lars Wirzenius
+1  A: 

Just use dates. Since you will remove by date. :)

Flinkman