views:

137

answers:

6

Generally, I had thought it was always better to store images in the filesystem and link to it via the database entry. However, I am trying to optimize my db design and am having a few questions.

My images are all really small thumbmails in black and white (not greyscale, but true B&W) and are 70x70 in size. If we take the images (which is basically a 2D array of 1 and 0), it can be stored as binary data that would be approximately 600 bytes each.

So my question is whether querying the 600 bytes stored in a db would be faster than querying a link followed by accessing the filesystem; assuming there are a lot of "image" queries being made.

Does anyone have any experience with this area?

If it matters, I am using MySQL, and MonetDB (separately, but have the same question for both).

Many thanks, Brett

+3  A: 

If it's only 600 bytes then I wouldn't worry too much and would store them in the database as a blob

There's an interesting article at High Scalablilty about how Flickr is architected. This might prove to be a useful read for you.

Sean
Great article, thanks!
Brett
+1  A: 

As far as I understand, there's no problem to store even bigger files in the DB, if you're not using SELECT * for no reason (frankly speaking, there's never a reason to use SELECT * at all).

BLOBs and TEXTs are stored separately from other data and don't affect performance if not queried explicitly.

codeholic
A: 

If you're talking of web application, storing images in the database just stupid. As you have no benefits that desktop application may gain, but only difficulties.

Col. Shrapnel
It depends on how use these files. If you give them away directly as static content, you're right. But if you need to do ACL checks before giving them away or you just give them away via network to other servers, storing images in the DB may be more comfortable.
codeholic
@codeholic noway. webserver can be used to check ACL. no database involved
Col. Shrapnel
@Col. How would you do that if your ACL is stored in the database?
codeholic
@codeholic: you're right, using the file system you need to create the cleaning procedure, if you don't each time you delete a row, a unuseful file is going to remain in your disk.
backslash17
@codeholic i will check the permissions and then tell to webserver to serve.
Col. Shrapnel
@Col. If you mean keeping your files in a non-readable location and serving them via scripts only, then I don't see advantages.
codeholic
A: 

This is not matter of file size only but also about the maximum amount of records you are expecting to have when the database is working. Older times we used to make this kind of math for any type of field. Just multiply 600 bytes for the maximum amount of records and if the result is something manageable don't worry about the speed.

As @codeholic says if you're not using SELECT *, everything goes fine.

backslash17
A: 

Storing the image in the database (and serving it on each request) prevents you from caching those images in a proxy server (or rather - complicates the task many-fold and bars almost all out-of-the-box solutions). The catch is that to measure the impact you need to look at it differently - instead of "how much time does a single query to fetch an image takes" ask yourself "how much time does a series of (put a reasonable number here) queries to get the same set of images takes". Maybe also ask yourself "must I pay the cost to roundtrip to the DB and back?".

Not that the idea is without merit - updating the images in a single location might be an important factor. Also, if high availability is a factor, it's much easier to configure with the DB as the central point of data (putting the images on the file system means synchronizing them between nodes when they're updated). Change tracking, permissions, additional data, avoiding "broken links" - these might play a part as well.

All of the above aside, I've had bad experience using the "filesystem" technique. Currently we're considering a move to the "database" technique.

Ran Biron
why prevent you from caching images on proxy servers?????? I don't see the point with this... with the correct http headers you would have the resource cached.
Claudio Redi
Nothing prevents me from doing so - but there are out-of-the-box solutions for a filesystem-based cache, while I could find no ready-made solutions for images taken from an SQL database. You have to do the wiring yourself (as I've written "complicates the task...").
Ran Biron
+1  A: 

Since you tagged the question sql-server then I recommend you to read To Blob or Not To Blob, a research paper by the regretted Jim Gray. The paper goes into plenty of detail on the topic of storing BLOBs in the file system (NTFS) vs. database (SQL Server), and you'll be amazed how many angles are considered. It's a MUST read. But the conclusion is this:

The study indicates that if objects are larger than one megabyte on average, NTFS has a clear advantage over SQL Server. If the objects are under 256 kilobytes, the database has a clear advantage. Inside this range, it depends on how write intensive the workload is, and the storage age of a typical replica in the system.

Your case falls clearly in the 'To BLOB' case.

Remus Rusanu