views:

530

answers:

3

I just read tons of material on Amazon's S3 and CouchDB. Maybe not enough yet though, so here is my question:

Both systems sound very appealing to me. CouchDB is distributed using the Apache License V2 and with Amazon's S3, you pay per stored megabyte and the traffic you generate. So there is a bit of a difference monetarily.

But from a technical point of view, from what I understood, both systems help you at storing unstructured data of arbitrary sizes (depending on the underlying OS as I understand from CouchDB).

I don't know how easy it would be to come up with a unified interface for both of them, so that you could just change your "datastore provider" as the need arises? Not having to change any of your code.

I also don't know if this is technically easily feasible, haven't looked at their protocols yet in great detail. But it would be great to postpone the provider decision to as late as possible.

Also this could be interesting for integration testing purposes: You could for example test against a local CouchDB instance and run your code against S3 for production use.

To formulate my question from a different angle: Is Amazon's S3 and CouchDB essentially solving the exact same thing or is this insane and I missed the whole point?

Updated Question

After Jim's brilliant answer, let me then rephrase the question to:

"Common Interface for CouchDB and Amazon SimpleDB"

And following the same lines of thinking, do you see a problem with a common interface between CouchDB and SimpleDB then?

+7  A: 

You're missing the point, just slightly. CouchDB is a database. S3 is a filesystem. They're both relatively unstructured, but with S3 you're storing files under keys while with CouchDB you're storing (arbitrarily-structured) data under keys.

The Amazon Web Services analogue to something like CouchDB would be Amazon SimpleDB.

Something like what you're looking for already exists for Ruby, and it's called Moneta. It even can store stuff on S3, which may be exactly what you want.

Jim Puls
+1  A: 

Technically a common layer is possible. However I question that this would make sense. Couchdb has integrated map/reduce functions for your documents which are exposed as "views". I don't think SimpleDB hat anything like that. On the other hand SimpleDB has query expressions which Couchdb has not. Of coure you can model thos expressions as a view in Couchdb if you know your query at development time.

Beside that the common function is not more than create/update/delete a key-document pair.

ordnungswidrig
+2  A: 

You are wrong Jim. S3 is not a filesystem. It is a webservice for a key-value store.

Amazon provides you with a key. Yes, the value of that key can be data that represents a file. But, how that gets managed in the Amazon system is something entirely different. It can be stored in one node, multiple nodes, geographically strategic nodes with cloudfront, and so on. There is nothing in that key in and of itself that indicates how the system will manage the file. The value of the key is never a file directly. It is data that represents the file. How that value gets eventually resolved into a file that the client receives is entirely separate.

The value of that key can actually be data that does not represent a file. It can be a JSON dictionary. In that sense, S3 could be used in the same way as CouchDB.

So I don't think the question is missing the point. In fact, it is a perfectly legitimate question as data in CouchDB is not distributed amongst nodes. And that could hamper performance.

Let's not even talk about Amazon SimpleDB. That is something separate. Please don't mix terms and then make claims based on that.

If you are not convinced by this claim, and if people request it, I am happy to provide a code bit that illustrates a JSON dictionary in S3.

I respect your answers to other questions Jim. But, here, you are clearly wrong and cannot see how those points are justified.

Ben Ahlan