views:

156

answers:

1

Although I've not yet used any of the new NoSQL databases I've tried to keep myself informed by reading Wikipedia articles, blogs and the peeking into some of the NoSQL DBs documentation.

I've just (re)read the August 2009 edition of php|architect, specifically the article about the Non-Relation Databases and a few questions popped up in my head, I understand that the article is pretty light on the subject but it was enough to get me confused...

CouchDB

My main question regarding CouchDB is why so much hype?. From what I understood CouchDB provides a Web Service that lets you create databases and documents inside the database, the documents can have several JSON-encoded attributes and also have a special _id and _rev attribute for tracking revisions of the document.

I really don't get all the fuss about this, some years ago for a pet project I coded a similar (?) system for storing documents and the structure was something like this:

documents/
  document-name/
    (revision) timestamp/
      (contents) md5-hash.txt
        PHP Serialized Data

I'm sure I'm missing something very fundamental, otherwise (from the viewpoint of a PHP developer) this would have the same benefits as CouchDB and be faster - no need to encode and decode JSON.


Amazon SimpleDB

Now this one really gets my head spinning... The author (Russell Smith) gives the following example:

$sdb->putAttributes('phparch', 'may', array('title' => array('value' => 'May 2009'), 'have' => array('value' => false)));
$sdb->putAttributes('phparch', 'june', array('title' => array('value' => 'June 2009'), 'have' => array('value' => true)));
$sdb->putAttributes('phparch', 'july', array('title' => array('value' => 'July 2009'), 'have' => array('value' => true)));

He then says that Amazon now supports a SQL-like interface and then executes the following query:

$sdb->select('phparch', 'SELECT * FROM phparch WHERE have = "1"');

He doesn't give any analogous example of how to do that query in CouchDB (he leaves some hints on Views and Map/Reduce however) but I suppose it is also possible, so my question is: how does Amazon (and CouchDB) do it?

My first guess would be that they open all documents (in possible in a distributed environment) and then apply a reduce operation to filter the documents whose attributes don't match the search criteria, but wouldn't this be overly expensive (CPU and Disk I/O) even in parallel computing?


I know I'm ignoring some important stuff like distribution, consistency and so on but I'm just trying to grasp the very basic inner workings of NoSQL storages.

PS: Also, can anyone explain me why both CouchDB and Amazon SimpleDB are built with Erlang?

+3  A: 

the fuss around nosql is down to indexing, availability, and scalability. indexing is what allows the document-oriented stores to NOT open all documents if you want to get the ones where have = 1. availablity and scalability allow these systems to easily scale out and be robust in the face of unreliable hardware.

erlang is designed for multi-processor systems and so is an ideal fit for distributed systems too.

oedo
Thanks oedo, do you mind explaining in more detail how the indexing works?
Alix Axel
conceptually speaking, an index is a quick lookup into some useful value, and so the implementation depends on the database, and sometimes the configuration of said database. couchdb's system is detailed http://couchdb.apache.org/docs/overview.html under 'view indexes'. if you're looking for the actual algorithms maybe a look at the actual db system code, couchdb is open source after all :)
oedo