What is best practice when creating document IDs in couchdb?

I'm no couchdb expert, but after having done a little research this is what I've found.

The simple answer is, use UUIDs unless you have a good reason not to.

The longer answer is, it depends on:

Cost of changing ID Vs How likely the ID is to change

Low cost of changing and likely to change ID

An example of this might be a blog with a denormalized design such as jchris' blog (sofa code available on git hub).

Every time another website links to a blog post, this is another reference to the id, so the cost of changing the id increases.

High cost of changing ID and an ID that will never change

An example of this is any DB design that is highly normalized that uses auto-increment IDs. Stackoverflow.com is a good example with its auto-incrementing question IDs that you see in every URL. The cost of changing the ID is extremely high since every foreign key would need to be updated.

How many references, or "foreign keys" (in relational DB language) will there be to the id?

Any "foreign keys" will greatly increase the cost of changing the ID. Having to update other documents is a slow operation and definitely should be avoided.

How likely is the ID to change?

If you are not wanting to use UUIDs you probably already have an idea of what ID you want to use.

If it is likely to change, the cost of changing the ID should be low. If it is not, pick a different ID.

What is your motivation for wanting to use an easily memorable ID?

Don't say performance.

Benchmarks show that "CouchDB’s view key lookups are almost, but not quite, as fast as direct document lookups". This means that having to do a search to find a record is no big deal. Don't choose friendly ids just because you can do a direct lookup on a document.

Will you be doing many bulk inserts?

If so, it is better to use incremental UUIDs for better performance.

See this post about bulk inserts. Damien Katz comments and says:

"If you want to have the fastest possible insert times, you should give the _id's ascending values, so get a UUID and increment it by 1, that way it's always inserting in the same place in the index, and being cache friendly once you are dealing with files larger than RAM. For an easier way to do the same thing, just sequentially number the documents but make it fixed length with padding so that they sort correctly, "0000001" instead of "1" for example."

This answer seems predicated on the notion that conflict avoidance is always desirable; however, sometimes conflicts are a natural part of the problem domain, and rather than simply being avoided, they should be proactively detected and resolved. In such cases, a natural ID is an excellent choice. For example, don't use the title of a blog post as an ID on a massively multi-user system, but do use the fully qualified domain name and IP address when modeling DNS address records.

2010-10-05 06:21:04

ansaurus

tags:

views:

answers:

What is best practice when creating document IDs in couchdb?

related questions