views:

2462

answers:

12

I have been looking for cloud computing / storage solutions for a long time (inspired by the Google Bigtable). But I can't find a easy-to-use, business-ready solution.

I'm searching a simple, fault tolerant, distributed Key=>Value DB like SimpleDB from Amazon.

I've seen things like:

  1. The CouchDB Project : Simple and distributed, fault-tolerant Database. But it understands only JSON. No XML connectors etc.
  2. Eucalyptus : Nice Amazon EC2 interfaces. Open Standards & XML. But less distributed and less fault-tolerant? There are also a lot of open tickets with XEN/VMWare issues.
  3. Cloudstore / Kosmosfs : Nice distributed, fault tolerant fs. But it's hard to configure. Are there any java connectors?
  4. Apache Hadoop : Nice system which much more then abilities to store data. Uses its own Hadoop Distributed File System and has been testet on clusters with 2000 nodes.
  5. *Amazon SimpleDB : Can't find an open-source alternative! It's a nice but expensive system for huge amounts of data. And you're addicted to Amazon.

Are there other, better solutions out there? Which one is the best to choose? Which one offers the smallest amount of SOF(Singe Point of Failure)?

+1  A: 

You might want to take a look at this (using MySQL as key-value store):

http://bret.appspot.com/entry/how-friendfeed-uses-mysql

Mehrdad Afshari
MySQL can't statisfy my needs, because it isn't really distributed, fault-tolerant and has mostly a big SOF. A Library for Key=>Value flatfile stores may be much better than mysql (table sizes etc.)
Martin K.
In the article, you can see that they are using MySQL clusters in a distributed non-SOF fashion. Hey, maybe flat files work better for your problem set, but I'd suggest considering what Friendfeed and others are doing.
jhs
+4  A: 

Wikipedia says that Yahoo both contributes to Hadoop and uses it in production (article linked from wikipedia). So I'd say it counts for business-provenness, although I'm not sure whether it counts as a K/V value database.

Not on your list is the Friendfeed system of using MySQL as a simple schema-less key/value store.

It's hard for me to understand your priorities. CouchDB is simple, fault-tolerant, and distributed, but somehow you exclude it because it doesn't have XML. Are XML and Java connectors an unstated requirement?

(Anyway, CouchDB should in fact be excluded because it's young, its API isn't stable, and it's not a key-value store.)

jhs
And terribly slow (CouchDB)
Robert Gould
Well speed isn't *necessarily* a deal-killer if you're talking about parallelizable distributed operation and fault-tolerance.
jhs
Also I just want to say how shocked and proud I am that I got my its-es and my it's-es right on the first try.
jhs
CouchDB should also be excluded because it is "only for rockstars" :p
Lucas B
A: 

Cloudera is a company that commercializes Apache Hadoop, with some value-add of course, like productization, configuration, training & support services.

Bill Karwin
+16  A: 

How about memcached?

The High Scalability blog covers this issue; if there's an open source solution for what you're after, it'll surely be there.

Other projects include:

Another good list: Anti-RDBMS: A list of distributed key-value stores

Assaf Lavie
It offers only in-memory persistance! Thats bad if you want to store more in your cluster than RAM is available.
Martin K.
Yea, but there's also memcachedb and other similar solutions that offer a true DB implementation + caching. And http://project-voldemort.com/ . in short, the HS blog covers all these systems, so you'll either find it there or you won't ;)
Assaf Lavie
+4  A: 

I use Google's Google Base api, it's Xml, free, documented, cloud based, and has connectors for many languages. I think it will fill your bill if you want free hosting too.

Now if you want to host your own servers Tokyo cabinet is your answer, its key=>value based, uses flat files, and is the fastest database out there right now (very barebones compared to say Oracle, but incredibly good at storing and accessing data, about 1 million records per second, with about 10bytes of overhead (depending on the storage engine)). As for business ready TokyoCabinet is the heart of a service called Mixi, which is the equivalent of Japan's Facebook+MyPage, with several million heavy users, so it's actually very battle proven.

Robert Gould
On wikipedia (http://en.wikipedia.org/wiki/Mixi) I can read that Mixi uses several hundred MySQL servers. Do they use both or is wikipedia wrong?
tuinstoel
Yup that info is outdated.
Robert Gould
I hope that the people from Hazlecast improve their db also (flatfiles etc.)Tokio Tryant/Tokio Cabinet are Master/Slave or Master/Standby clustered. Thats no real cloud approach ;(
Martin K.
+2  A: 

You might want to look at hypertable which is modeled after google's bigtable.

Brian Mitchell
A: 

Instead of looking for something inspired by Google's bigtable- Why not just use bigtable directly? You could write a front-end on Google App-Engine.

callingshotgun
he said he wanted it to be open source, and would likely have the same cost constraints for big data sets as amazon's simpleDB.
Brian Mitchell
+3  A: 

If you want something like Bigtable, you can't go past HBase or Hypertable - they're both open-source Bigtable clones. One thing to consider, though, is if your requirements really are 'big enough' for Bigtable. It scales up to thousands of tablet servers, and as such, has quite a bit of infrastructure under it to enable that (for example, handling the expectation of regular node failures).

If you don't anticipate growing to, at the very least, tens of tablet servers, you might want to consider one of the proposed alternatives: You can't beat BerkelyDb for simplicity, or MySQL for ubiquity. If all you need is a key/value datastore, you can put a simple 'dict' wrapper around your database interface, and switch out your backend if you outgrow one.

Nick Johnson
Correction: Hypertable is in C++
Gregg Lind
Thanks for the correction.
Nick Johnson
+2  A: 

Use The CouchDB

  • Whats wrong with JSON?
  • JSON to XML is trivial
TFD
+4  A: 

MongoDB is another option which is very similar to CouchDB, but using query language very similar to SQL instead of map/reduce in JavaScript. It also supports indexes, query profiling, replication and storage of binary data.

It has huge amount of documentation which might be overwhelming at fist, so I would suggest to start with Developer's tour

dpavlin
A: 

Good compilation of storage tools for your question :

http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/

Ajit Singh
A: 

Tokyo Cabinet has also received some attention as it supports table schemas, key value pairs and hash tables. It uses Lua as an embedded scripting platform and uses HTTP as it's communication protocol Here is an great demonstration.

David Robbins