ansaurus

Question

How to organize a large number of objects

Answer 1

A:

Your approach does not seem unreasonable, but might suffer if you get more than a few thousand documents added within a single day (file systems tend not to cope well with very large numbers of files in a directory).

Storing the .xml document beside the .pdf seems a bit odd - If it's really metadata about the document, should it not be in the database (which it sounds like you already have) where it can be easily queries and indexed etc?

When storing very large numbers of files I've usually taken the file's key (say, a URL), hashed it, and then stored it X levels deep in directories based on the first characters of the hash...

Say you started with the key 'http://stackoverflow.com/questions/2734454/how-to-organize-a-large-number-of-objects'. The md5 hash for that is 0a74d5fb3da8648126ec106623761ac5 so you might store it at...

base_dir/0/a/7/4/http___stackoverflow.com_questions_2734454_how-to-organize-a-large-number-of-objects

...or something like that which you can easily find again given the key you started with.

This kind of approach has one advantage over your date one in that it can be scaled to suit very large numbers of documents (even per day) without any one directory becoming too large, but on the other hand, it's less intuitive to someone having to manually find a particular file.

Matt Sheppard 2010-04-29 03:39:46

Thanks Matt. The way we currently handle large number of docs in a single day is to split them into subfolders: 1/ 2/ 3/... which is another reason that makes me think there should be a better way...

shane 2010-04-29 04:17:53

ansaurus

tags:

views:

answers:

How to organize a large number of objects

related questions