I am developing a system that is all about media archiving, searching, uploading, distributing and thus about handling BLOBs.
I am currently trying to find out the best way how to handle the BLOB's. I have limited resources for high end servers with a lot of memory and huge disks, but I can access a large array of medium performance off-the-shelf computers and hook them to the Internet.
Therefore I decided to not store the BLOBs in a central Relational Database, because I would then have, in the worst case, one very heavy Database Instance, possibly on a single average machine. Not an option.
Storing the BLOBs as files directly on the filesystem and storing their path in the database is also somewhat ugly and distribution would have to be managed manually, keeping track of the different copies myself. I don't even want to get close to that.
I looked at CouchDB and I really like their peer-to-peer based design. This would allow me to run a distributed cluster of machines across the Internet, implies:
- Low cost Hardware
- Distribution for Redundancy and Failover out of the box
- Lightweight REST Interface
So if I got it right, one could summarize it like this: Cloud like API and self managed, distributed, replicated system
The rest of the system does the normal stuff any average web application does: handling session, security, users, searching and the like. For this part I still want to use a relational datamodel. (CouchDB claims not to be a replacement for relational databases).
So I would have all the standard data, including the BLOB's meta data in the relational database but the BLOBs themselves in CouchDB.
Do you see a problem with this approach? Am I missing something important? Can you think of better solutions?
Thank you!