views:

258

answers:

2

I'm wondering what is actually stored in a CouchDB database B-tree? The CouchDB: The Definitive Guide tells that a database B-tree is used for append-only operations and that a database is stored in a single B-tree (besides per-view B-trees).

So I guess the data items that are appended to the database file are revisions of documents, not the whole documents:

            +---------|### ...  
            |           |
   +------|###|------+     ... ---+
   |        |        |            |
+------+ +------+ +------+     +------+
| doc1 | | doc2 | | doc1 | ... | doc1 |
| rev1 | | rev1 | | rev2 |     | rev7 |
+------+ +------+ +------+     +------+

Is it true?

If it is true, then how the current revision of a document is determined based on such a B-tree?

Doesn't it mean, that CouchDB needs a separate "view" database for indexing current revisions of documents to preserve O(log n) access? Wouldn't it lead to race conditions while building such an index? (as far as I know, CouchDB uses no write locks).

+1  A: 

CouchDB does not store diffs. When you update a document, it appends the whole new document with a new _rev and the same _id as the old version. The old version is removed during compaction.

titanoboa
Yes, CouchDB doesn't store diffs. My question is how does it store documents internally in order to make both write-only save operations *and* current version retrievals, without locks?
Andrey Vlasovskikh
+1  A: 

The database file on disk is append-only; however the B-tree is conceptually modified in-place. When you update a document,

  1. Its leaf node is written (via append to the DB file)
  2. Its parent node is re-written to reference the new leaf (via append of course)
  3. Repeat step 2 until you update the root node

When the root node is written, that is effectively when the newer revision is "committed." To find a document, you start at the end of the file, get the root node, and work down to your doc id. The latest revision will always be accessible this way.

jhs
It's still unclear to me when the algorithm for determining the winning revision (http://books.couchdb.org/relax/reference/conflict-management) comes into play during the current document revision lookup. If the user is reading the document with the key ID1, then according to the scheme you've described he will get the **latest written** revision (thanks for your point on serializing writes using an Erlang process), **not the winning one**.
Andrey Vlasovskikh
I guess I need to dig into the source code. It's quite observable: 18 KLOC.
Andrey Vlasovskikh
The conflict management algorithm decides which order to store them in (i.e. does this get rev 4 and that get rev 5 or vice versa ). A simple lookup by ID always fetches the latest revision stored. In this example, revision 5 would be the "winner." The application may want to merge the conflict more meaningfully by creating a revision 6 that is the sum of 4 and 5.
jhs