views:

104

answers:

3

Hi All,

I have a CouchDB (v0.10.0) database that is 8.2 GB in size and contains 3890000 documents.

Now, I have the following as the Map of the view

function(doc) {emit([doc.Status], doc);

And it takes forever to load (4 hours and still no result).

Here's some extra information that might help describing the situation:

  1. The view is not a temp view. The view is defined before the 3890000 documents are inserted.

  2. There isn't anything on the server. It is a ubuntu box with nothing but the defaults installed.

  3. I see that my CPU is moving and working hard (sometimes shoots to 100%). The memory is moving as well but not increasing.

So my question is:

  1. What is actually happening in the background?
  2. Is this a "one time" thing where I have to wait once and it will somehow works later?

Many thanks,

Chi

A: 

Views are only updated the next time they are read. Upon reading, it processes all the documents that have been updated (created, updated, deleted) since the last time the view was read.

So even if you're view was defined before inserting the 3890000 documents, it will be processing the 3890000 documents for the view.

From http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

Note that by default views are not created and updated when a document is saved, but rather, when they are accessed. As a result, the first access might take some time depending on the size of your data while CouchDB creates the view. If preferable the views can also be updated when a document is saved using an external script that calls the views when updates have been made. An example can be found here: RegeneratingViewsOnUpdate

Evan
Perfect! So I guess it is building the index at that time. And even if I reboot it will not do this (since update is done). Thanks Evan!
Chi Chan
A: 

Also just came across this tip, which might be useful if you're running on Ubuntu:

http://nosql.mypopescu.com/post/1299848121/couchdb-and-ubuntu-configuration-trick-for

Evan
+1  A: 

Don't emit the whole doc, that's unnecessary. You can just do your query with include_docs=true and then you can access the document as the doc attribute in each row.

When you emit the whole doc you make the index as large or larger than you're entire database :)

mikeal