views:

196

answers:

3

I am developing a system that is all about media archiving, searching, uploading, distributing and thus about handling BLOBs.

I am currently trying to find out the best way how to handle the BLOB's. I have limited resources for high end servers with a lot of memory and huge disks, but I can access a large array of medium performance off-the-shelf computers and hook them to the Internet.

Therefore I decided to not store the BLOBs in a central Relational Database, because I would then have, in the worst case, one very heavy Database Instance, possibly on a single average machine. Not an option.

Storing the BLOBs as files directly on the filesystem and storing their path in the database is also somewhat ugly and distribution would have to be managed manually, keeping track of the different copies myself. I don't even want to get close to that.

I looked at CouchDB and I really like their peer-to-peer based design. This would allow me to run a distributed cluster of machines across the Internet, implies:

  • Low cost Hardware
  • Distribution for Redundancy and Failover out of the box
  • Lightweight REST Interface

So if I got it right, one could summarize it like this: Cloud like API and self managed, distributed, replicated system

The rest of the system does the normal stuff any average web application does: handling session, security, users, searching and the like. For this part I still want to use a relational datamodel. (CouchDB claims not to be a replacement for relational databases).

So I would have all the standard data, including the BLOB's meta data in the relational database but the BLOBs themselves in CouchDB.

Do you see a problem with this approach? Am I missing something important? Can you think of better solutions?

Thank you!

A: 

No problem. I have done a design very similar to that one. You may also want to take a peek to HBase as an alternative to CouchDB and to the Adaptive Object-Model architectural pattern, as a way to manage your data and meta-data.

Hugo S Ferreira
+2  A: 

You could try Amazon's relational database SimpleDB and S3 toghether with SimpleJPA. SimpleJPA is a JPA-implementation on top of SimpleDB. SimpleJPA uses SimpleDB for the relational structure and S3 to store BLOBs.

Yrlec
+2  A: 

Take a look at MongoDB, it supports storing binary data in an efficient format and is incredibly fast

Alan
Mongo is also closer to a relational model, so you might be able to get away with just using that, instead of Couch+relational.
kristina