We have a large number of documents and metadata (xml files) associated with these documents. What is the best way to organize them?
Currently we have created a directory hierarchy:
/repository/category/date(when they were loaded into our db)/document_number.pdf and .xml
We use the path as a unique identifier for the document in our system. Having a flat structure doesn't seem to a good option. Also using the path as an id helps to keep our data independent from our database/application logic, so we can reload them easily in case of failure, and all documents will maintain their old ids. Yet, it introduces some limitations. for example we can't move the files once they've been placed in this structure, also it takes work to put them this way. What is the best practice? How websites such as Scribd deal with this problem?