views:

114

answers:

3

I need to store about 600,000 images on a web server that uses NTFS. Am I better off storing images in 20,000-image chunks in subfolders? (Windows Server 2008)

I'm concerned about incurring operating system overhead during image retrieval

+1  A: 

NTFS folders store an index file with links to all its contents. With a large amount of images, that file is going to increase a lot and impact your performance negatively. So, yes, on that argument alone you are better off to store chunks in subfolders. Fragments inside indexes are a pain.

Shyam
While you might be right, is this based on anything but assumption ? We've folders with 450k files, there's no problems so far - though I guess browsing them, even in a file manager wouldn't be fast.
leeeroy
Actually, it is not assumption (experience with various implementations of NTFS and mostly its limitations). It's an error by design. My answer was to clarify if it would affect performance (yes), not if it would give 'problems' such as failure. The index file is not a smart file. If it grows you get more maintenance of getting the fragments back. And I quote myself: Fragments inside indexes are a pain.
Shyam
+5  A: 

Go for it. As long has you have an external index and have a direct file path to each file with out listing the contents of the directory then you are ok.

I have a folder with that is over 500 GB in size with over 4 million folders (which have more folders and files). I have somewhere in the order of 10 million files in total.

If I accidentally open this folder in windows explorer it gets stuck at 100% cpu usage (for one core) until I kill the process. But as long as you directly refer to the file/folder performance is great (meaning I can access any of those 10 million files with no overhead)

Pyrolistical
+3  A: 

Depending on whether NTFS has directory indexes, it should be alright from the application level.

I mean, that opening files by name, deleting, renaming etc, programmatically should work nicely.

But the problem is always tools. Third party tools (such as MS explorer, your backup tool, etc) are likely to suck or at least be extremely unusable with large numbers of files per directory.

Anything which does a directory scan, is likely to be quite slow, but worse, some of these tools have poor algorithms which don't scale to even modest (10k+) numbers of files per directory.

MarkR

related questions