views:

164

answers:

2

Hello,

I have large data set, which I want to query. The query does not change but the underlying data does. From what I read, I could construct a "view" and query it. Also, I read that Couch DB knows how to update the view when data is changed so I assume querying the view again would be still fast.

My questions are, do I understand CounchDB's views correctly? I don't need any other feature of CouchDB, I don't even need SQL, all I want is fast same query over changing data. Could I use something else? If I would use, say, good old MySQL would it really be slower than CouchDB (read: in the above scenario, how would various DBs approximately perform?).

A: 

I don't think anyone can answer your question given the information you have provided.

Indexes in a relational database are analogous to CouchDB views. In both cases, they store a pre-sorted instance of the data and the database keeps that instance in sync with the canonical data. Both types of database transparently use the index/view to speed up subsequent queries of the form that the index/view was designed for.

Without indexes/views, queries must scan the whole collection of n records of data and they execute in O(n) time. When a query benefits from an indexes/views, it executes in O(log n) time.

But that's speaking very broadly of the performance curve with respect to the volume of data. A given database could have such speedy performance in certain cases that it out-performs another product no matter what. It's hard to make generalizations that brand X is always faster than brand Y. The only way to be sure about a specific case is to try that case in both databases and measure the performance.

Bill Karwin
I knew that indexes are pre-sorted (i.e. O(log N)) but I thought views get automatically populated with newly updated data, so there would be no search at all. In other words, I thought views are very different than indexes, what performance goes... Btw, you say I provided not enough information, could you please be more specific? Like the amount of data? I also thought it is more important that the query stays the same and data changes than anything else....
Lars Kanto
Which product has best performance depends a lot on your specific queries, and what indexes/views you have defined. That's what I mean by not enough information. Each product has its strengths and weaknesses, so you can optimize only after knowing what queries you need to optimize for.
Bill Karwin
Also: you can use indexes to avoid a search if the index includes all the columns you need in your result. This is called a *covering index*. Of course in many RDBMS brands, you can define an index only for columns in one table at a time, so it may be less versatile than a CouchDB view.
Bill Karwin
+1  A: 

Your assessment is completely correct. Enjoy!

The only performance trick worth mentioning is that you may see a boost if you emit() all of the data you need from the view and never use the ?include_docs feature, because include_docs causes CouchDB to go back into the main database and retrieve the original doc that caused that view row. In other words, you can emit() everything you need into your view index (more space but faster), or you can use the reference back to the original document (less space but slower.)

jhs