The kind of problems you have been told about very likely have to do with the performance impact of piling thousands and thousands of files in the same directory.
To circumvent this, do not store your files directly under one directory, but try to spread them out under subdirectories (buckets).
In order to achieve this, look at the ID (let's say 19873) of the file you are about to store, and store it under <uploads>/73/98/19873_<filename.ext>
, where 73 is ID % 100
, 98 is (ID / 100) % 100
etc.
The above guarantees that you will have at most 100 subdirectories under <uploads>
, and at most 100 further subdirectories underneath <uploads>/*
. This will thin out the number of files per directory at the leaves significantly.
Two levels of subdirectories are typical enough, and represent a good balance between not wasting too much time resolving directory or file names to inodes both in breadth (what happens when you have too many filenames to look through in the same directory - although modern filesystems such as ext3
will be very efficient here) and depth (what happens when you have to go 20 subdirectories deep looking for your file). You may also elect to use larger or smaller values (10, 1000) instead of 100. Two levels with modulo 100 would be ideal for between 100k and 5M files
Employ the same technique to calculate the full path of a file on the filesystem given the ID of a file that needs to be retrieved.
Cheers,
V.