views:

1241

answers:

7

At SO there has been much discussion about how many files in a directory are appropriate: on older filesystems stay below a fey thousand on newer stay below a few hundred thousand. Generally the suggestion is to create sub-directories for every few thousand files.

So the next question is: what is the maximum number of sub directories I should put into a directory? Nesting them too deep kills dir tree traversal performance. IS there a nesting them to shallow?

+1  A: 

Really depends on the OS you are using as directory manipulations are done using system calls. For unix based OS, i-node look-up algorithms are highly efficient and number of files and folders in a directory does not matter. May be that's why there is no limit to it in Unix based systems. However, in windows, it varies from file-system to file-systems.

Chirantan
+1  A: 

Wow, are you really creating so many files that this matters? Maybe you should re-examine your file creation strategies :-).

Seriously, I can't think of that many situations where I'd end up with even a thousand files in my sub-directories. Certainly not executable or configuration types.

Maybe log-type files could get to those sort of numbers but, even if you were creating a log file every minute (and why would you?), that's still only 1400-odd a day.

Then just have one sub-directory for each day and it would take several years to reach a thousand sub-directories.

paxdiablo
Situation where one might need to store thousands of files under a directory is when the developer is building a document management system for few thousand users. Of-course it also depends on the design of storage.
Chirantan
At a bare minimum, each user should get their own directory. You may even have /a/l/allan, /a/l/alex, /a/n/andrew, not even on the same filesystems. We do this for attachments to our 5-digit problem reports (/0/0/00000, /0/0/00001, /0/3/03123 and so on).
paxdiablo
There's a lot of reasons to have thousands of files in a directory - the maildir format is one; have a lot of deleted but not purged mail and you'll have a lot of files.
Adam Hawes
The maildir way of doing things appears to be horrific. Having to open so many files to get a list of messages is a waste of time IMO. Far better to store them all in a single flat file or, better yet, database.
paxdiablo
A: 

Usually modern filesystems (like NTFS or ext3) don't have a problem with accessing files directly (ie. if you are trying to open /foo/bar/baz.dat). Where you can run into problems is enumerating subdirectories / files in a given directory (ie. give me all the files/dirs from /foo). This can occur in multiple scenarios (for example while debugging, or during backup, etc). I found that keeping childcount around a couple of hundred at most gave me acceptable response times.

Of course this varies from situation to situation, so do test :-)

Cd-MaN
A: 

My guess is as little as possible.

At ISP I was working for (back in 2003) we had lots of user emails and web files. We structured them with md5 hashed usernames, 3 levels deep (ie. /home/a/b/c/abcuser). This resulted in maybe up to 100 users inside third level directory.

You can make deeper structure with user directories in shallow structure too. The best option is to try and see, but smaller the directory count faster the lookup is.

Damir Horvat
A: 

I've come across a similar situation recently. We were using the file system to store serialized trade details. These would only be looked at infrequently and it wasn't worth the pain to store them in a database.

We found that Windows and Linux coped with a thousand or so files but it did get much slower accessing them - we organised them in sub-dirs in a logical grouping and this solved the problem.

It was also easier to grep them. Grepping through thousands of files is slower than changing to the correct sub-dir and grepping through a few hundred.

Fortyrunner
A: 

I found out the hard way that for UFS2 the limit is around 2^15 sub-directories. So while UFS2 and oder modern filesystems work decently with a few hundred thousand files in a directory it can only handle relatively few sub-directories. The non-obvious error message is "can't create link".

While I haven't tested ext2 I found various mailing list postings where the posters also had issues with more than 2^15 files on an ext2 filesystem.

mdorseif
+1  A: 

From a practicality standpoint applications might not handle well large directory entries. For example Windows Explorer gets bogged down with with several thousand directory entries (I've had Vista crash, but XP seems to handle it better).

Since you mention nesting directories also keep in mind that there are limits to the length of fully qualified (with drive designator and path) filenames (See wikipedia 'filename' entry). This will vary with the operating system file system (See Wikipedia 'comparison on file systems' entry).

For Windows NTFS it is supposed to be 255, however, I have encountered problems with commands and API functions with fully qualified filenames at about 120 characters. I have also had problems with long path names on mapped networked drives (at least with Vista and I.E. Explorer 7).

Also there are limitations on the nesting level of subdirectories. For example CD-ROM (ISO 9660) is limited to 8 directory levels (something to keep in mind if you would want to copy your directory structure to a CD-ROM, or another filesystem).

So there is a lot of inconsistency when you push the file system to extremes (while the file system may be able to handle it theoretically, apps and libraries may not).

Roger Nelson