views:

243

answers:

5

In the past, I've handled user image uploads in two different ways:

  • Save the image data in a database table, and load it via a PHP script
  • Upload the image, convert it to jpeg, put it in a directory and load it via HTML tags

The first option worked fairly well, but I had to keep fairly constraining size restrictions on uploaded images. The second option kind of worked, except for PHP's image library would often botch the file conversion (I think that can be fixed by using ImageMagick instead, however).

I'm now faced with doing this for a third time, on a potentially larger scale of users. I already plan on using ImageMagick to do some post processing to the image after the upload. I'd love to keep the restrictions on image uploads small as possible and possibly even retain every photo that someone uploads.

There will be times when hundreds of user's photo uploads will be displayed on the screen as a thumbnail at once. Storing these in a database and pulling them for each one and displaying via PHP doesn't seem like a good way to go, but throwing all images in to a single directory doesn't either.

What is your recommendation in this situation? Do you go with one of the options above or do you have a different solution?

+4  A: 

Storing these in a database and pulling them for each one and displaying via PHP doesn't seem like a good way to go, but throwing all images in to a single directory doesn't either.

You can take a hybrid approach.

Store the images in a heirarchy of folders (according to whatever scheme you determine to be appropriate for your application). Store the full path to each image in your database.

In a background job, have thumbnails of the images produced (say with ImageMagick) whose filenames slightly differ from the images themselves (eg: add 'thumb-' on the front) but which are stored alongside the real images. You can have a field in the database for each image which means "My thumbnail is ready, so please include me in galleries".

When you get a request for a gallery, slice and dice the group of images using database fields, then produce a piece of HTML which refers to the appropriate thumbnail and image paths.


Edit: What Aaron F says is important when you need to handle a very large number of requests. Partitioning of the image / sql data is a good road to scalability. You'll need to look at access patterns in your application to determine where the partition points lie. Something you can do even sooner is to cache the generated HTML for the galleries in order to reduce SQL load.

David Toso
+2  A: 

The two examples you cite don't clarify the most important part: partitioning.

e.g. if stored in a database, do the images reside entirely in one table, on one database server? Or is the table partitoned across several servers/clusters?

e.g. if stored in a directory, do all the images reside on one hard drive? Or do images map to separate [RAID] drives, based on the first letter in the user's login name?

A good partitioning scheme is needed for scalability.

As far as displaying thumbnails in bulk, you'll probably want some precomputation here. i.e. create the thumbnails (perhaps via an ayschronous job, kicked off just after the image is uploaded) and stage them on a dedicated server. This is how Youtube does image snapshots of uploaded videos.

Aaron F.
+1  A: 

Imho, David's solution is the best for most cases, but I would modify two details:

Store the full path to each image in your database.

I don't think you need to store the full path, because if for some reason the directory of the images changes, that would complicate your life a bit. Storing only the filename should be enough. You can always include the full path directly in the html.

have thumbnails of the images produced (say with ImageMagick) whose filenames slightly differ from the images themselves but which are stored alongside the real images.

I prefer to put the thumbnails in a different directory, one directory for each thumb you create, and with exactly the same filename. I once worked on a site with thousands of user images and I made the mistake of put them all in the same directory. The directory grew so much that it's almost impossible to open this directory without having to wait several minutes.

PHP's image library would often botch the file conversion

I suggest to increase the memory limit in the script where you create the thumb, specially if you allow the upload of big (2mb+) files.

You can do this with ini_set('memory_limit', '30M');. Of course, the actual number is up to you. 30M has worked for me in thumbnail-heavy sites.

sideral
A: 

uhm for creating thumbnails, use phpthumb ... it is absolutely perfect for that matter ... and has a few extras, but you probably won't need those ... it automatically creates a local cache on your filesystem, so it is very resource saving ...

i think, the hybrid approach is probably the most scalable, i.e. store files on file system and file locations in a database ... that way, you can store some metadata with the file (like creator, title, tags, etc.), and keep your database small ... plus, you can store your images in arbitrary locations (even on other machines, etc.), so you can easily distribute all that payload ...

back2dos
+1  A: 

Some years ago I wrote an intranet image archive meant to ultimately store some 340 thousand scans plus relative thumbnails. Googling around I found out that there is no hard reason not to dump them all into a single directory as long as you don't ask the underlying OS to do a folder listing. In other words, calling for a ls/dir would hang the machine but just retrieving single image files by their filename (from the database) would imply no performance penalty.

That archive has been running for a couple of years now and I can confirm that it works fine with something above 60k images actually stored in a folder.

I've never had any kind of problem converting stuff to jpeg with GD, but for that particular job I went the ImageMagick way with MagicWand as a helper extension (mostly because of decent documentation).

djn