ansaurus

Question

id autoincrement/sequence emulation with CassandraDB/MongoDB etc

Answer 1

A:

Im not sure I follow you. What language are you using? Are we talking about uuid?

The following is how you generate UUIDs in some languages:

java.util.UUID.randomUUID(); // (Java) variant 2, version 4

import uuid   // (Python)
uuid.uuid1()  //  version 1

Schildmeijer 2010-05-05 07:57:38

UUID not works for me, I do need to use ALL numbers from 0 to Long.MAX_VALUE (java). so I do need something that exactly works like sql sequence

abovesun 2010-05-05 08:00:27

Answer 2

+1 A:

Autoincrement IDs inherently don't scale well as they need a single source to generate the numbers. This is why shardable/replicatable databases such as MongoDB use longer, GUID-like identifiers for objects. Why do you need LONG values so badly?

You might be able to do it using atomic increments, retaining the old value, but I'm not sure. This would be limited to single server setups only.

wump 2010-05-05 08:07:18

Thanks for good understanding of my problem source. Please check out updated question description, thanks

abovesun 2010-05-05 08:15:49

Do they have to be consecutive? What about "choose a random number in 1..MAX(LONG) and insert". Make sure the field has an unique index. If the insert fails, try again with a different random value.This is atomic and safe. And pretty efficient as long as your DB size doesn't go near 2^31 :)

wump 2010-05-05 08:20:44

Yes, probably this approach not so bad for me, saying true I don't sure that Cassandra has analog of SQL unique constraint, at least I didn't find how to emulate it. But I didn't try hard yet:)

abovesun 2010-05-05 08:25:35

Also one problem of Cassandra is "eventually consistent" model http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

abovesun 2010-05-05 08:28:55

CouchDB does have an unique constraint, at least on _id. And yes, an eventually consistent model can make trouble, so this will only work with one server.The only way to generate autoincrement IDs in a distributed system is a dedicated ID generation server (AKA "single point of failure").

wump 2010-05-05 08:30:44

Cassandra has options to set consistency level, the top of them is "ALL" means "Ensure that the write is written to all <ReplicationFactor> nodes before responding to the client. Any unresponsive nodes will fail the operation." http://wiki.apache.org/cassandra/APIBut probably this is not good with large cluster because of performance overhead.

abovesun 2010-05-05 08:48:57

@wump. Is it true that generating autoincrement ids means that you have single point of failure? Can't Oracle RAC generate sequence numbers?

TTT 2010-05-05 11:08:41

TTT: Well, you could distribute generating autoincrement ids over multiple machines by providing backup servers, but they all need to be synchronized with (ie, wait for) each other. So performance-wise it's not scalable, but it can be made more robust.

wump 2010-05-05 11:55:19

@wump, you only need two to prevent a single point of failure. Not tens or hunders of machines. You can have two synced servers whose only job is generating (and returning) auto ids.

TTT 2010-05-05 14:41:48

If ids don't need to be strictly sequential, it is easy to generate ids on multiple servers - simply increment by the maximum number of servers rather then by 1, so with two servers one would return even ids, the other odd ids.

Tom Clarkson 2010-05-06 02:39:55

Tom: Indeed, that's similar to how it works with UUIDs and other object IDs such as used with MongoDB. They allocate part of the space per-server. Except that you're burning much less fast through 96/128 bits than 32.

wump 2010-05-06 06:09:37

Answer 3

+2 A:

Don't know about Cassandra, but with mongo you can have an atomic sequence (it won't scale, but will work the way it should, even in sharded environment if the query has the sharded field).

It can be done by using the findandmodify command.

Let's consider we have a special collection named sequences and we want to have a sequence for post numbers (named postid), you could use code similar to this:

> db.runCommand( { "findandmodify" : "sequences",
                   "query" : { "name" : "postid"},
                   "update" : { $inc : { "id" : 1 }},
                   "new" : true } );

This command will return atomically the updated (new) document together with status. The value field contains the returned document if the command completed successfully.

Hubert Kario 2010-10-12 20:39:57

ansaurus

tags:

views:

answers:

id autoincrement/sequence emulation with CassandraDB/MongoDB etc

related questions