



is it bad to output many files to the same directory in unix/linux? I run thousands of jobs on a cluster and each outputs a file, to one directory. The upper bound here is around ~50,000 files. Can IO be limited in speed in light of this? If so, does the problem go away with a nested directory structure?



I believe that most filesystems store the names of contained files in a list (or some other linear-time access data structure) so storing large numbers of files in a single directory can cause slowness for simple operations like listing. Having a nested structure can ameliorate this problem by creating a tree structure (or even a Trie, if it makes sense) of names which can reduce the time it takes to retrieve file stats.


My suggestion is to use nested directory structure (ie categorization). You can name them using timestamps, special prefixes for each application etc. This gives you a sense of order when you need to search for specific files and for easier management of your files.
