views:

675

answers:

5

I'm using what seems to be a common trick for creating a join view:

// a Customer has many Orders; show them together in one view:
function(doc) {
  if (doc.Type == "customer") {
    emit([doc._id, 0], doc);
  } else if (doc.Type == "order") {
    emit([doc.customer_id, 1], doc);
  }
}

I know I can use the following query to get a single customer and all related Orders:

?startkey=["some_customer_id"]&endkey=["some_customer_id", 2]

But now I've tied my query very closely to my view code. Is there a value I can put where I put my "2" to more clearly say, "I want everything tied to this Customer"? I think I've seen

?startkey=["some_customer_id"]&endkey=["some_customer_id", {}]

But I'm not sure that {} is certain to sort after everything else.

Credit to cmlenz for the join method.

Further clarification from the CouchDB wiki page on collation:

The query startkey=["foo"]&endkey=["foo",{}] will match most array keys with "foo" in the first element, such as ["foo","bar"] and ["foo",["bar","baz"]]. However it will not match ["foo",{"an":"object"}]

So {} is late in the sort order, but definitely not last.

A: 

CouchDB is mostly written in Erlang. I don't think there would be an upper limit for a string compound/composite key tuple sizes other than system resources (e.g. a key so long it used all available memory). The limits of CouchDB scalability are unknown according to the CouchDB site. I would guess that you could keep adding fields into a huge composite primary key and the only thing that would stop you is system resources or hard limits such as maximum integer sizes on the target architecture.

Since CouchDB stores everything using JSON, it is probably limited to the largest number values by the ECMAScript standard.All numbers in JavaScript are stored as a floating-point IEEE 754 double. I believe the 64-bit double can represent values from - 5e-324 to +1.7976931348623157e+308.

Sean A.O. Harney
Perhaps I wasn't clear enough. The ID for that customer doesn't change between the min and max values. CouchDB, however, allows compound keys. It orders first by the first entry (constant here, and equal to "some_customer_id"), then by the second (null for the start key, 2 or {} for the end key), and so on. I'm wondering whether (and why) {} is the maximum possible value for a key in CouchDB's ordering.
James A. Rosen
I think the problem is in my question title -- I'll rename for clarity.
James A. Rosen
Oh I didn't see you were talkng about composite keys. There seems to be little limitations on CouchDB I doubt there is a hard limit on the size of the tuple for the composite key. I believe system resources would be tested for some db operations if you made a table with thousands of fields and hundreds of fields as part of the composite index.
Sean A.O. Harney
Hm, I don't seem to be able to say what I mean. I don't mean the _length_ of the composite key. I mean the value that gets sorted at the end. If we were just talking US English letters, I'd mean "Z"; if we were talking numbers, I might mean something like "MAX_INTEGER" or ∞. In UTF-8, I think I mean \u9999. But a CouchDB key item can be _any_ JS object, so I'm not sure what the "max" value is.
James A. Rosen
For numbers JavaScript always uses IEEE double floating points. So the CouchDB limit should be the same as that. Even integers are stored as floating point numbers in JavaScript.
Sean A.O. Harney
@Sean A.O. Harney: James A. Rosen is talking about collation of composite values (e.g. arrays and objects). The following link should make it more clear what he means: [http://wiki.apache.org/couchdb/View_collation#Collation_Specification
A: 

It seems like it would be nice to have a feature where endKey could be inclusive instead of exclusive.

Nathan Feger
Actually, "endkey" is inclusive by default. You have to specify "endkey_inclusive=false" to get exclusive behavior.
A: 

This should do the trick:

?startkey=["some_customer_id"]&endkey=["some_customer_id", "\uFFFF"]

This should include anything that starts with a character less than \uFFFF (all unicode characters)

bogphanny
I don't think so. The article you linked to says that all strings come before all arrays, which in turn come before all Hashes. So ["some_customer_id", "\uFFFF"] is 'less than' ["some_customer_id", {}].
James A. Rosen
bogphanny
@bogphanny: This isn't a relational database query. The comma is not an implicit conjunction. All of the keys emitted for this view are two-element arrays, so your query would yield no results.
+2  A: 

I have two thoughts.

Use timestamps

Instead of using simple 0 and 1 for their collation behavior, use a timestamp that the record was created (assuming they are part of the records) a la [doc._id, doc.created_at]. Then you could query your view with a startkey of some sufficiently early date (epoch would probably work), and an endkey of "now", eg date +%s. That key range should always include everything, and it has the added benefit of collating by date, which is probably what you want anyways.

or, just don't worry about it

You could just index by the customer_id and nothing more. This would have the nice advantage of being able to query using just key=<customer_id>. Sure, the records won't be collated when they come back, but is that an issue for your application? Unless you are expecting tons of records back, it would likely be trivial to simply pluck the customer record out of the list once you have the data retrieved by your application.

For example in ruby:

customer_records = records.delete_if { |record| record.type == "customer" }

Anyways, the timestamps is probably the more attractive answer for your case.

Jim Garvin
+1  A: 

Rather than trying to find the greatest possible value for the second element in your array key, I would suggest instead trying to find the least possible value greater than the first: ?startkey=["some_customer_id"]&endkey=["some_customer_id\u0000"]&inclusive_end=false.

Note "inclusive_end" guards against the ridiculous case where you actually have a key of the form "some_customer_id\u0000", by not including documents matching the "endkey" in the result.