ansaurus

Question

Answer 1

A:

MediaWiki generates the MD5 sum of the name of the uploaded file, and uses the first two letters of the MD5 (say, "c" and "f" of the sum "cf1e66b77918167a6b6b972c12b1c00d") to create this directory structure:

images/c/cf/Whatever_filename.png

You could also use the image ID for a predictable upper limit on the number of files per directory. Maybe take floor(image unique ID / 1000) to determine the parent directory, for 1000 images per directory.

Adam Backstrom 2010-04-15 20:22:59

We use a similar approach, but with a 4-level deep structure:12/34/56/78Works great for millions of files.

Evert 2010-04-19 05:48:41

Answer 2

A:

You might consider the open source http://danga.com/mogilefs/ as it is perfect for what you're doing. It'll take you from thinking about folders to namespaces (which could be users) and let it store you images for you. The best part is you don't have to care how the data is stored. It makes it completely redundant and you can even set controls around how redundant thumbnails are as well.

Nissan Fan 2010-04-15 20:30:05

Answer 3

A:

Have you thought about using something like Amazon S3 to store the files? I run a photo hosting company and after quickly reaching limits on our own server, we switched over to AmazonS3. The beauty of S3 is that there are no limits like inodes and what not, you just keep throwing files at it.

Also: If you don't like S3, you can always try and break it down into subfolders as much as you can:

/userid/year/month/day/photoid.jpg

webdestroya 2010-04-19 02:06:59

Answer 4

A:

I've answered a similar question before but I can't find it, maybe the OP deleted his question...

Anyway, Adams solution seems to be the best so far, yet it isn't bulletproof since images/c/cf/ (or any other dir/subdir pair) could still contain up to 16^30 unique hashes and at least 3 times more files if we count image extensions, a lot more than any regular file system can handle.

AFAIK, SourceForge.net also uses this system for project repositories, for instance the "fatfree" project would be placed at projects/f/fa/fatfree/, however I believe they limit project names to 8 chars.

I would store the image hash in the database along with a DATE / DATETIME / TIMESTAMP field indicating when the image was uploaded / processed and then place the image in a structure like this:

images/
  2010/                                      - Year
    04/                                      - Month
      19/                                    - Day
        231c2ee287d639adda1cdb44c189ae93.png - Image Hash

Or:

images/
  2010/                                    - Year
    0419/                                  - Month & Day (12 * 31 = 372)
      231c2ee287d639adda1cdb44c189ae93.png - Image Hash

Besides being more descriptive, this structure is enough to host hundreds of thousands (depending on your file system limits) of images per day for several thousand years, this is the way Wordpress and others do it, and I think they got it right on this one.

Duplicated images could be easily queried on the database and you'd just have to create symlinks.

Of course, if this is not enough for you, you can always add more subdirs (hours, minutes, ...).

Personally I wouldn't use user IDs unless you don't have that info available in your database, because:

Disclosure of usernames in the URL
Usernames are volatile (you may be able to rename folders, but still...)
A user can hypothetically upload a large number of images
Serves no purpose (?)

Regarding the CDN I don't see any reason this scheme (or any other) wouldn't work...

Alix Axel 2010-04-19 02:47:45

ansaurus

tags:

views:

answers:

Image upload storage strategies

related questions