views:

81

answers:

3

is it bad to output many files to the same directory in unix/linux? I run thousands of jobs on a cluster and each outputs a file, to one directory. The upper bound here is around ~50,000 files. Can IO be limited in speed in light of this? If so, does the problem go away with a nested directory structure?

Thanks.

A: 

I believe that most filesystems store the names of contained files in a list (or some other linear-time access data structure) so storing large numbers of files in a single directory can cause slowness for simple operations like listing. Having a nested structure can ameliorate this problem by creating a tree structure (or even a Trie, if it makes sense) of names which can reduce the time it takes to retrieve file stats.

maerics
A: 

My suggestion is to use nested directory structure (ie categorization). You can name them using timestamps, special prefixes for each application etc. This gives you a sense of order when you need to search for specific files and for easier management of your files.

ghostdog74