views:

2217

answers:

9

This is a question which has been asked before (large-text-and-images-in-sql) but mainly for data which will be changed. In my case the data will be stored and never changed. Just seems sensible to keep everything together.

Are there any reasons why I should not store static binary data in a database?

Assuming it is a sensible thing to do, are there any advantages to storing such data in separate tables? (You might begin to realise now that I'm not a DB expert...)

Clarify: There will probably be no more than 10-20 users but these will be in the US and in the UK. The binary data will have to be transfered in any case.

+13  A: 

The advantage of storing data in the DB is taking advantage of DB security mechanisms and reducing maintanence cost (backups, ...). The disadvantage of it is increasing DB load and consuming connections (which might be expensive for per-connection licensed database servers). If you are using SQL Server 2008, FILESTREAM might be a nice alternative.

By the way, for Web apps (or any other apps that might need streaming the data), it's usually more sensible to store data outside DB.

Mehrdad Afshari
+6  A: 

The biggest dissadvantage if you are storing BLOBS is memory consumption. Can you imagine what select * from x would do for thousands of records with a 45k image in each?

As Mehrdad said there are also advantages. So if you decide to go with that approach you should try to design your database so that most queries return less results with BLOB data in them. Maybe for example make one to one relationships for this purpose.

Vasil
+1 Good point - perhaps a good reason to put blobs in separate table and pick up via id?
paul
To be honest I've always been afraid to use BLOB, because I suck at sql. But if I have to I'd probably make a separate one-to-one relationship for each blob. Pretty much using it as I use references to files. Except these would be stored in db. Note: please don't do this in webapps.
Vasil
IMHO, it is not a valid argument. Doing `select * from x` is a bad idea in most circumstances, except the one where you need to use *every* column of a table in your app. Putting blobs in a separate table is even worse, because it would require joins and complicate the requests.
MainMa
+1  A: 

Isn't this exactly what LOBs or CLOBs or .... were designed?

We used CLOBs to store large encryptions of credit card card transactions for a major airline system.

Memory consumption is your greatest culprit though.

HTH

cheers,

Rob Wells
+4  A: 

I think this depends on the application your building. If you're building a CMS system, and the usage of the data is going be to display images within a web browser, it might make sense to save the images to disk as opposed to being put into the database. Although honestly I would do both, which could allow adding a server to a farm without having to copy files all over the place.

Another use case might be a complex object, such as a workflow, or even a business object with lots of interdependancies. You could serialize both of these into a binary or text based format, and save them in the DB. Then you get the benefit of the DB: ATOMIC, Backups, etc...

I don't think people should be using select * queries in the first place. What you do is provide two ways to get the data, One methods returns the summary information, the second would return the blob. I can't imagine why you would need to return thousands of images all at once.

JoshBerke
+1 For the ideas. About the select * from part. You don't necesseraly have to write that query by hand. Some ORMs use this kinds of queries by default, so if someone is not careful... ouch.
Vasil
Heh, you know which ORM uses those queries? I want to stay away from them. nHibernate I know does not
JoshBerke
I've seen in some php framework, can't recall. But since they are in a web app, they probably thought select * is less data over the wire than select foo, bar, sausage. I bet they never thought about BLOBS.
Vasil
Or about the fragility of select *...
JoshBerke
It doesn't look very fragile from a php standpoint tho :).
Vasil
It would be fragile if your code was accessing columns by index rather than name. If you add a column at index 2, all columns after the new one would have a different index, and things can get ugly.
ZombieSheep
+1  A: 

Some database(e.g. Postgresql) automatically compress fields, perhaps it is faster when reading them directly from db. And also, the program can read all the fields and image in one swoop.

Michael Buen
Yes, if I ever used blobs it'd be postgres. You save in bandwith. But the data has to come uncompressed in the application's process at some point.
Vasil
Many blobs (images, mp3s, etc.) are essentially precompressed anyway.
le dorfier
A: 

We store attachments in our system, and you cannot change an attachment, so I think we're on the same page w/data that "will be stored and never changed." We specifically decided not to store it in the database. We did this for two reasons, simplicity, and backup/recovery time.

Simplicity first: In our case these attachments are uploaded from the end-user's browser, and it's simpler to just write them to a directory (on the DB server) than it is to then stream them down the SQL pipe. There is a record of them in the DB, but the DB just contains meta-information about the attachment, and the name of the file on disk (a guid in our case)

On the backup/recovery side: These blobs will likely become one of the largest pieces of your database. Whenever you run a full backup you'll be copying these bits over and over, even though you know then can never change. To us it just seemed much simpler to have (much) smaller backups, and do an xcopy of the attachment directory to a secondary server as the backup.

WaldenL
A: 

The performance issue here as been address above, so I won't repeat it. But I think a good tip if you are storing things that will be streamed out a lot (such as images/documents on a web-site) is to build in a caching system.

By this I mean store all the data in your database, but when someone requests that file, check if it exists on disk (based on a known filename, in a temp folder), if not, grab it from the DB and write it to the folder, and then stream that to the user. For the next request to the same file, since it exists on disk, it can be served from there without hitting the DB. But if you need to delete these files (or your web-server goes kapput!), it doesn't matter as they will be rebuilt again from the DB as people request them. This should be much quicker than serving each request for the same file from the DB.

JonoW
A: 

I'm familiar with a fairly good-sized OSS project that made the decision at its inception to store images in the MySQL database, and it's proven to be among the top 3 bad ideas they have been coping with ever since. (Exacerbated by the fact the "refactor mercilessly" is anathema, but that's another story.)

Among the serious problems this has caused:

  1. Exceeding maximum efficient database size (mysql). (The total space required for images exceeds all others by a at least 2 orders of magnitude).

  2. Image files lose their "fileness". No dates sizes etc. unless stored (redundantly) as dates (which require code for management).

  3. Arbitrary byte sequences don't process nicely all the time, for either storage or manipulation.

  4. "We'll never need to access the images externally" is a dangerous assumption.

  5. Fragility. Because the whole arrangement is unnatural and touchy, and you don't know where it will bite next (contributing to the anti-refactor mentality).

The benefits? None that I can think of, except it might have been the path of least resistance at the time.

le dorfier
I'm assuming the bad decision was to store blobs. Correct?
paul
Correct - clarified.
le dorfier
One substantial benefit is consistency of data: with proper keys the "files" cannot be deleted without the meta data and vice-versa. For disk files there are no such constraints and adding/deleting the files and their meta data is a separate application (or function) that must be designed, implemented and USED.
NVRAM
Yes, you still need to write a application with proper validation - but that's true either way. I wouldn't call the difference "substantial". What *was* substantial was all the extra work it took to get at the images with other applications and utilities when they are only available with database calls, and most image handling software doesn't come with database persistence built in. So just to view the image required extracting it with one app, viewing it with another, and then making sure it got put back in the proper place when done. And forget browsing the images in Explorer, say.
le dorfier
+1  A: 

Addressing the issue from a principles point of view, a relational database is (mainly) there for storing structured data. If you cannot make a query condition or join on a data element it probably doesn't belong in the database. I don't see an image BLOB being used in a WHERE clause, so I'd say keep it outside the database. A CLOB on the other hand can be used in queries.

Nils Weinander
+1 interesting aspect
paul
We'll probably won't be using the phone number in a WHERE clause either, because it's not frequent at all to search anything by phone number (unless you're working on a reverse-lookup system). That said, we do store phone numbers in the DB, not in external files even though it's seldom used as a join or filter condition. What I mean is that that reason is not enough for discarding the possibility to save an image inside a relational DB.
Seb
But you *can* make a query condition on the phone number, or use it for a join, something you cannot reasonably do with a BLOB column.
Nils Weinander