views:

148

answers:

2

Possible Duplicate:
How many files in a directory is too many?

I was told that putting too many files in a directory can cause performance problems in Linux, and Windows. Is this true? And if so, what's the best way to avoid this?

+1  A: 

According to this Microsoft article, the lookup time of a directory increases proportional to the square of the number of entries. (Although that was a bug against NT 3.5.)

A similar question was asked on the Old Joel on Software Forum. One answer was that performance seems to drop between 1000 and 3000 files, and one poster hit a hard limit at 18000 files. Still another post claims that 300,000 files are possible but search times decrease rapidly as all the 8.3 filenames are used up.

To avoid large directories, create one, two or more levels of subdirectories and hash the files into those. The simplest kind of hash uses the letters of the filename. So a file starting abc0001.txt would be placed as a\b\c\abc0001.txt, assuming you chose 3 levels of nesting. 3 is probably overkill - using two characters per directory reduces the number of nesting levels. e.g. ab\abc0001.txt. You will only need to go to two levels of nesting if you anticipate that any directory will have vastly more than ca. 3000 files.

mdma
My own experience with two levels of nesting subdirectories A-Z+0-9 on a network server is problematic. For some reason Windows seems to take forever to enumerate the files, even though each individual subdirectory contains about 10 files or less.
Mark Ransom
+1  A: 

The Windows file system is currently NTFS. The max amount of files on a volume is 4,294,967,295. File cataloging on the drive takes place in a B+ Tree which gives you a Log(N) lookup.

On the old FAT32 there was a limit of 64K files in a folder. Indexing was also done by a list per folder, therefore after a couple of thousand performance dropped off drastically. You probably do not need to worry about FAT32, unless your audience has DOS, windows 95,98 or Millenium (Yuck).

On Linux it really depends on the File System you are using (It could be NTFS if you decide to do so) extf3 has a limitation of 32k files per directory. The lookup is also B+ Tree and will give you LOG(N) lookup

After looking this through further your question should really be regarding limitations of file systems.

Romain Hippeau
If he wanted to know the hard limitations, that's what he would have asked. There are "soft" limitations where the performance becomes less than ideal, and you will run into these soft limits long before you hit the hard limits.
Robert Harvey