+3  A: 

Honestly? I don't think it matters until you get to a certain size (and I can't, for the life of me, remember that size...). The thing is to find a method and then stick with it, hopefully it'll be in such a way that you never need to touch it again. My own advice, without anything as convincing as evidence to support it, is something akin to your own suggestion:

c:\<customer_id>\<document_year>\<document_month>\<document_day>\actual_file.tif

I'd also raise the suggestion that, depending on your server setup, it might be worth giving each customer (depending on the amount of data or account type) their own drive/partition.

Bear in mind that, without some sort of user-control or permissions system, that file-paths could be predictably guessed and browsed (as if you didn't know this already...I know, I'm sorry). The fact that you raised the bullet point of 'six digit unique code' suggests that you don't need a path of common-format, but I would suggest that a common-format (of whatever format you end up choosing) would be a better idea.

Back in my Windows days I sorted my own directories around the file's primary-relation, it'd be considered a 'tag' nowadays (c:\documents and settings\university\year1\module21\assignment1.doc for example), this made it easier to find things later. Your customers appear to have their directory structure enforced -by you- but finding things that they did last week is easier if they only have to traverse the date, remembering where they put something last week when they get to the six-digit unique number-named folders is going to be, well, difficult. At best.

David Thomas
+2  A: 

Your question is very similar to this one. Is your load primarily reading your images or writing? If it's read scalability you need, the post describes memcached, which is probably all you need. jackrabbit has loads more features, but is more for hierarchical text storage. Not sure it will do any better performance wise on your images. Also, if you do choose jackrabbit, make sure your content hierarchy is deep enough for jackrabbit to stay efficient. Any parent with 10,000 or more children is going to have sub-par performance.

DaveParillo
memcache will only help if a small number if images are read a lot AND you have more then one server. Otherwise just use a 64 bit system and put lots of RAM in the file server. Let the OS do the cacheing for you.
Ian Ringrose
+1  A: 

The strategy for storage you proposed would need to be addressed if you intend to move your content to different machines (SAN/NAS). To do this, you would need to strip all the customer data from the path, and just create a hash that you then save in the database to link to the file you are accessing. This way you are left with a folder structure something like this:

NAS1/00/01/86/63/54/89/image01/image.tiff
NAS2/00/02/46/62/22/11/image02/image.tiff
...

I would also recommend you take a gander at MogileFS. All you need to do to speed it up is to add some sort of a proxy in front of it and all should be well.

And like Dave mentioned, make sure you don't have too many children in one folder. Things tend to get quite sluggish around 10.000.

Miha Hribar