views:

47

answers:

3

Is there a NoSQL (or other type of) database suitable for storing a large number (i.e. >1 billion) of "medium-sized" blobs (i.e. 20 KB to 2 MB). All I need is a mapping from A (an identifier) to B (a blob), the ability to retrieve "B" given A, a consistent external API for access, and the ability to "just add another computer" to scale the system.

Something simpler than a database, e.g. a distributed key-value system, may just fine, and I'd appreciate any thoughts along that vein as well.

Thank you for reading.

Brian

+1  A: 

What about Jackrabbit?

Apache Jackrabbit™ is a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and 283).

A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more.

I knew Jackrabbit when I worked with Liferay CMS. Liferay uses Jackrabbit to implement its Document Library. It stores user files in the server's file system.

Leniel Macaferi
+1  A: 

If your API requirements are purely along the lines of "Get(key), Put(key,blob), Remove(key)" then a key-value store (or more accurately a "Persistent distributed hash table") is exactly what you are looking for.

There a quite a few of these available, but without additional information it is hard to make a solid recommendation - What OS are you targeting? Which language(s) are you developing with? What are the I/O characteristics of your app (cold/immutable data such as images? high write loads aka tweets?)

Some of the KV systems worth looking into: - MemcacheDB - Berkeley DB - Voldemort

You may also want to look into document stores such as CouchDB or RavenDB*. Document Stores are similar to KV stores but they understand the persistence format (usually JSON) so they can provide additional services such as indexing.

  • If you are developing in .Net then skip directly to RavenDB (you'll thank me later)
Addys
Thanks, Addys - that's helpful. As for more info, here are the current targets: OS: Linux/BSD; Language: Python; I/O: write-once, read many.
Brian M. Hunt
+1  A: 

You'll also want to take a look at Riak. Riak is very focused on doing exactly what you're asking (just add node, easy to access).

Gates VP