views:

2151

answers:

6

Is SQL Server 2008 a good option to use as an image store for an e-commerce website? It would be used to store product images of various sizes and angles. A web server would output those images, reading the table by a clustered ID. The total image size would be around 10 GB, but will need to scale. I see a lot of benefits over using the file system, but I am worried that SQL server, not having an O(1) lookup, is not the best solution, given that the site has a lot of traffic. Would that even be a bottle-neck? What are some thoughts, or perhaps other options?

A: 

For something like an e-commerce web site, I would be moe likely to go with storing the image in a blob store on the database. While you don't want to engage in premature optimization, just the benefit of having my images be easily organized alongside my data, as well as very portable, is one automatic benefit for something like ecommerce.

Patrick Harrington
A: 

If the images are indexed then lookup won't be a big problem. I'm not sure but I don't think the lookup for file system is O(1), more like O(n) (I don't think the files are indexed by the file system).

What worries me in this setup is the size of the database, but if managed correctly that won't be a big problem, and a big advantage is that you have only one thing to backup (the database) and not worry about files on disk.

rslite
+9  A: 

10 Gb is not quite a huge amount of data, so you can probably use the database to store it and have no big issues, but of course it's best performance wise to use the filesystem, and safety-management wise it's better to use the DB (backups and consistency).

Happily, Sql Server 2008 allows you to have your cake and eat it too, with:

The FILESTREAM Attribute

In SQL Server 2008, you can apply the FILESTREAM attribute to a varbinary column, and SQL Server then stores the data for that column on the local NTFS file system. Storing the data on the file system brings two key benefits:

  • Performance matches the streaming performance of the file system.
  • BLOB size is limited only by the file system volume size.

However, the column can be managed just like any other BLOB column in SQL Server, so administrators can use the manageability and security capabilities of SQL Server to integrate BLOB data management with the rest of the data in the relational database—without needing to manage the file system data separately.

Defining the data as a FILESTREAM column in SQL Server also ensures data-level consistency between the relational data in the database and the unstructured data that is physically stored on the file system. A FILESTREAM column behaves exactly the same as a BLOB column, which means full integration of maintenance operations such as backup and restore, complete integration with the SQL Server security model, and full-transaction support.

Application developers can work with FILESTREAM data through one of two programming models; they can use Transact-SQL to access and manipulate the data just like standard BLOB columns, or they can use the Win32 streaming APIs with Transact-SQL transactional semantics to ensure consistency, which means that they can use standard Win32 read/write calls to FILESTREAM BLOBs as they would if interacting with files on the file system.

In SQL Server 2008, FILESTREAM columns can only store data on local disk volumes, and some features such as transparent encryption and table-valued parameters are not supported for FILESTREAM columns. Additionally, you cannot use tables that contain FILESTREAM columns in database snapshots or database mirroring sessions, although log shipping is supported.

Vinko Vrsalovic
A: 

Normally a good solution is to store the images themselves on the filesystem, and the metadata (file name, dimensions, last updated time, anything else you need) in the database.

Having said that, there's no "correct" solution to this.

Greg
there's no "correct" solution to this .. until SQL 2008. See other answers - SQL 2008 has a new construct for this - the FILESTREAM.
Anthony
+1  A: 

Check out this white paper from MS Research (http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2006-45)

They detail exactly what you're looking for. The short version is that any file size over 1 MB starts to degrade performance compared to saving the data on the file system.

Joel.Cogley
This paper was last updated in June 2006 and applied only to SQL Server 2005
Tom H.
A: 

I doubt that O(log n) for lookups would be a problem. You say you have 10GB of images. Assuming an average image size of say 50KB, that's 200,000 images. Doing an indexed lookup in a table for 200K rows is not a problem. It would be small compared to the time needed to actually read the image from disk and transfer it through your app and to the client.

It's still worth considering the usual pros and cons of storing images in a database versus storing paths in the database to files on the filesystem. For example:

  • Images in the database obey transaction isolation, automatically delete when the row is deleted, etc.
  • Database with 10GB of images is of course larger than a database storing only pathnames to image files. Backup speed and other factors are relevant.
  • You need to set MIME headers on the response when you serve an image from a database, through an application.
  • The images on a filesystem are more easily cached by the web server (e.g. Apache mod_mmap), or could be served by leaner web server like lighttpd. This is actually a pretty big benefit.
Bill Karwin
Your last point is actually quite important. Consider the situation of moving images to an offsite store such as S3 or to a CDN.
Min
Also check out http://nginx.net/ for another tool that can help. It's a kind of proxy for Apache, on the response side. Interesting!
Bill Karwin