views:

327

answers:

3

What's the best practice for storing a large amount of image data in SQL Server 2008? I'm expecting to store around 50,000 images using approx 5 gigs of storage space. Currently I'm doing this using a single table with the columns:

ID: int/PK/identity
Picture: Image
Thumbnail: Image
UploadDate: DateTime

I'm concerned because at around 10% of my expected total capacity it seems like inserts are taking a long time. A typical image is around 20k - 30k. Is there a better logical structure to store this data? Or do I need to look into clustering or some other IT solution to accommodate the data load?

+2  A: 

To DB or not to DB, that is the question.

You're starting a religious war here with Images in DB.

The opinion would be split for SQL 2000, but 2005 and above do a fairly decent job of storing blobs - just look at the number of SharePoint installations that use MS SQL Server as their storage. I would only go this route for minor image storage.

If you do end up putting them in the DB, I would say that you should separate the image from the data associated with it for ease of querying and reducing your IO and the instances when developers write SELECT * (and yes, they will).

Check out FILESTREAM in SQL 2008 - it is meant for things like this.

Here are some other points on DB vs. File system that you may want to consider:

  • DB storage, backups, restoring, maintenance licensing is expensive
  • Storage in DB is harded to get at than on disk
  • Disk can be accelerated
  • You will need to write code to get/set images in DB - not required for disk
Raj More
How would you synchronize a backup restore between the db and the files? This would be difficult with an appliation that manages images that are changed often and requires version control. Seems like the files and the meta data records would be off.
Jeff O
I prefer insert based models to update based models. So if a user has a new image, I would not overwrite the record, I would choose to have a new row added and the old one marked with a do-not-use flag. I would have writing to the file and the database as a single transaction - they are always in sync at a point in time.
Raj More
+1  A: 

Check out the new Filestream features in SQL Server 2008. Essentially it lets you store blob (read: image) data in the database, without the overhead of having to read the data into sql buffers on every read and write. It seamlessly uses the filesytem to store your large files instead of sql pages. This can lead to much faster read and write times for larger files and , best of all, since this all happens under the hood, you don't need to change any existing stored procs to work with filestream columns. See here for code samples and some performance profiling.

Chris Clark
+2  A: 

Image is a deprecated data type in SQL Server 208. It has been replaced with VARBINARY(MAX) since SQL 2005. If you decide to store the image in the DB, then you should use VARBINARY(MAX) fields and consider adding the FILESTREAM option.

For streaming data, like images, FILESTREAM is much faster than VARBINARY(MAX) alone, according to this white paper:

Filestream vs. varbinary(max) performance

Note that to achieve this streaming performance you must use the proper API in your design and obtain the Win32 handle of the BLOB. Note that updates into a FILESTREAM column (including INSERTS) will be slower than VARBINARY(MAX).

Remus Rusanu