tags:

views:

260

answers:

3

Recently, I and my colleagues, we are discussing how to build a huge storage systems which could store billions a pictures which could searched and download quickly.

Something like a fickr, but not for an online gallery. Which means, most of these picture will never be download.

My colleages suggest that we should save all these files in database directly. I really feels that it's not a good idea and I think database is not desgined for restore huge number of binary files. But I have very strong reason for why that's not a good ideas.

What do you think about it.

+14  A: 

When dealing with binary objects, follow a document centric approach for architecture, and not store documents like pdf's and images in the database, you will eventually have to refactor it out when you start seeing all kinds of performance issues with your database. Just store the file on the file system and have the path inside a table of your databse. There is also a physical limitation on the size of the data type that you will use to serialize and save it in the database. Just store it on the file system and access it.

CodeToGlory
Interestingly SQL Server 2008 does this for you with the FILESTREAM storage option- http://msdn.microsoft.com/en-us/library/cc949109.aspx
RichardOD
Although I agree with this, doesn't SharePoint store almost everything in the database? If so, I'd think that the SharePoint people might not think it is a bad idea to store files in the database. I believe it is beneficial in some ways (like querying), but those ways probably don't fully counteract the things you've mentioned here.
Dusty
@RichardOD, I read the paper and it mainly talks about the same challenges of storing structured content vs. unstructured content and recommends NTFS."FILESTREAM is a new feature in the SQL Server 2008 release. It allows structured data to be stored in the database and associated unstructured (i.e., BLOB) data to be stored directly in the NTFS file system. You can then access the BLOB data through the high-performance Win32® streaming APIs, rather than having to pay the performance penalty of accessing BLOB data through SQL Server."
CodeToGlory
A: 

It's not a good idea. The point of a database is that you can quickly resolve complex queries to retrieve textual data. While binary data can be stored in a database, it can slow transactions. This is especially true when the database is on a separate server from the running application. In the database, store meta-data and the location/filename of the images. Images themselves should be on static server(s).

Corey D
+1  A: 

If you are really talking about billions of images, I would store them in the file system (retrieval will be faster than serializing and de-seralizing the images

andrewWinn
Yeah, I am really talking about billions of images.Miracles happens everyday.
ablmf