The answer is filtered replication. I like to do this in two parts:
- Replicate the production database,
example_db
to my local server as example_db_full
- Perform filtered replication from
example_db_full
to example_db
, where the filter cuts out enough data so builds are fast, but keeps enough data so I can confirm my code works.
Which documents to select can be application-specific. At this time, I am satisfied with a simple random pass/fail with a percentage that I can specify. The randomness is consistent (i.e., the same document always passes or always fails.)
My technique is to normalize the content checksum in the document _rev
field on a range of [0.0, 1.0). Then I simply specify some fraction (e.g. 0.01
), and if the normalized checksum value is <= my fraction, the document passes.
function(doc, req) {
if(/^_design\//.test(doc._id))
return true;
if(!req.query.p)
throw {error: "Must supply a 'p' parameter with the fraction"
+ " of documents to pass [0.0-1.0]"};
var p = parseFloat(req.query.p);
if(!(p >= 0.0 && p <= 1.0)) // Also catches NaN
throw {error: "Must supply a 'p' parameter with the fraction of documents"
+ " to pass [0.0-1.0]"};
// Consider the first 8 characters of the doc checksum (for now, taken
// from _rev) as a real number on the range [0.0, 1.0), i.e.
// ["00000000", "ffffffff").
var ONE = 4294967295; // parseInt("ffffffff", 16);
var doc_val = parseInt(doc._rev.match(/^\d+-([0-9a-f]{8})/)[1], 16);
return doc_val <= (ONE * p);
}