views:

66

answers:

3

If I have 3 million pages, which directory structure is better?

Method 1. ~/123456789.htm

(Putting all the 3 million pages into the same folder without any sub folders)

Method 2. ~/789/123456789.htm

(Create 999 sub-folders, each sub-folder contains about 3000 pages)

For Windows Server 2008, which folder structure is faster? (For file creation, reading and deletion)

+1  A: 

IMO putting thousand of files in one directory isn't a good practice.

One option is to get the hex representation of the integer,
eg 123456789 -> 075BCD15

and use this directory structure: ~/07/5B/CD/15.htm

Nick D
Wow, never thought of that, but isn't that a little... complicated? Why not break up the decimal digits?
Jeff Meatball Yang
@Jeff, I don't think it's complicated. Breaking up the digits? Hmm, yeah that's another option but I'm not sure whether it's less complicated or not :)
Nick D
Advantage of Hex is that the number has fixed size. I would go with that, too - keeps the files manageable in size AND one can use reparse points to map subfolders to separate drives if needed.
TomTom
A: 

I think it depends highly on your filesystem format (NTFS? ext3?) since you haven't specified you want to list files, I think 3 million is fine for a single directory to create/read/delete files.

From experience, I can say it is not fun to use NTFS to list a folder with even 30,000 files.

Jeff Meatball Yang
It is no fun in any file system. THe problem in this case for example is not NTFS (which handles that nicely), it is windows explorer. Any interactive UI showing all files will simply not work efficiently as load times kill you. Command line works fine, but even then a listing is not efficient. Linus would have the same problem, as would any other file system.
TomTom
+1  A: 

It is definitely faster to got with the subfolder variant. Our DMS stores files based on their creation date in a structur like ./YYYY/MM/DD/HH/MM/, which is a good was to look up a file if you know its age.

Just imagine that NTFS on Windows 2008 has directories implemented as a list. If you have 3.000.000 files, the whole list is to be searched. If you have a tree, with maybe 10 entries per level (You need a deepness of 6 then, because you have 3*10^6 files), access to individual files is much faster.

If you always want to access all files at once, e.g. for batch processing, the all files in one folder strategy might be the fastest, but then don't accidentially click on that folder in Explorer, except you need a reason to get you some coffee :)

Daniel

related questions