tags:

views:

77

answers:

1

I need to know the factoring that needs to be taken into consideration when implementing a solution using CouchDB. I understand that CouchDB does not require normalization and that the standard techniques that I use in RDBMS development are mostly thrown away.

But what exactly are the costs involved. I perfectly understand the benefits, but the costs of storage make me a bit nervous as it appears as CouchDB would need an awful lot of replicated data, some of it going stale and out of date well before its usage. How would one manage stale data?

I know that I could implement some awful relationship model with documents using Couchdb and lower the costs of storage, but wouldn't this defeat the objectives of Couchdb and the performances that I can gain?

An example I am thinking about is a system for requistions, ordering and tendering. The system currently has the one to many thing going on and the many might get updated more frequently than the one.

Any help would be great as I am an old school RDBMS guy with all the teachings of C.J. Date, E.F Codd and R. F. Boyce, so struggling at the moment with the radical notion of document storage.

Does Couchdb have anything internal to manage the recognition and reduction of duplicate data?

+1  A: 

Only you know how many copies of how much data you will use, so unfortunately the only good answer will be to build simulated data sets and measure the disk usage.

In addition, similar to a file system, CouchDB requires additional storage for metadata. This cost depends on two factors:

  1. How often you update or create a document
  2. How often you compact

The worst-case instantaneous disk usage will be the total amount of data times two, plus all the old document revisions (#1) existing at compaction time (#2). This is because compaction builds a new database file with only the current document revisions. Therefore the usage will be two copies of current data (from the old file plus the new file), plus all of the "wasted" old revisions awatiing deletion when compaction completes. After compaction, the old file is deleted so you will reclaim over half of this worst-case value.

Running compaction all the time is no problem to reduce data use however it has implications with disk i/o.

jhs
I forgot about the versioning stuff actually. But as you say I can compact it down so not bothered by that.
WeNeedAnswers
You are right. It's worth keeping in the back of your head because you will occasionally be reading and re-writing the entire active data set from disk. That could be a lot of i/o, depending on your data.
jhs

related questions