views:

610

answers:

3

Hi, I'm trying to build small web-system (url shortcutting) using nonsql Cassandra DB, the problem I stack is id auto generation.

Did someone already stack with this problem?

Thanks.

P.S. UUID not works for me, I do need to use ALL numbers from 0 to Long.MAX_VALUE (java). so I do need something that exactly works like sql sequence

UPDATED:

The reason why I'm not ok with GUID ids is inside of scope of my application.

My app has url shortcutting part, and I do need to make url as short as possible. So I follow next approach: I'm taking numbers starting from 0 and convert it base64 string. So in result I have url like mysite.com/QA (where QA is base 64 string).

This is was very easy to implement using SQL DB, I just took auto incremented ID, convert it to URL and was 100-percents sure, that URL is unique.

A: 

Im not sure I follow you. What language are you using? Are we talking about uuid?

The following is how you generate UUIDs in some languages:

java.util.UUID.randomUUID(); // (Java) variant 2, version 4

import uuid   // (Python)
uuid.uuid1()  //  version 1
Schildmeijer
UUID not works for me, I do need to use ALL numbers from 0 to Long.MAX_VALUE (java). so I do need something that exactly works like sql sequence
abovesun
+1  A: 

Autoincrement IDs inherently don't scale well as they need a single source to generate the numbers. This is why shardable/replicatable databases such as MongoDB use longer, GUID-like identifiers for objects. Why do you need LONG values so badly?

You might be able to do it using atomic increments, retaining the old value, but I'm not sure. This would be limited to single server setups only.

wump
Thanks for good understanding of my problem source. Please check out updated question description, thanks
abovesun
Do they have to be consecutive? What about "choose a random number in 1..MAX(LONG) and insert". Make sure the field has an unique index. If the insert fails, try again with a different random value.This is atomic and safe. And pretty efficient as long as your DB size doesn't go near 2^31 :)
wump
Yes, probably this approach not so bad for me, saying true I don't sure that Cassandra has analog of SQL unique constraint, at least I didn't find how to emulate it. But I didn't try hard yet:)
abovesun
Also one problem of Cassandra is "eventually consistent" model http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
abovesun
CouchDB does have an unique constraint, at least on _id. And yes, an eventually consistent model can make trouble, so this will only work with one server.The only way to generate autoincrement IDs in a distributed system is a dedicated ID generation server (AKA "single point of failure").
wump
Cassandra has options to set consistency level, the top of them is "ALL" means "Ensure that the write is written to all <ReplicationFactor> nodes before responding to the client. Any unresponsive nodes will fail the operation." http://wiki.apache.org/cassandra/APIBut probably this is not good with large cluster because of performance overhead.
abovesun
@wump. Is it true that generating autoincrement ids means that you have single point of failure? Can't Oracle RAC generate sequence numbers?
TTT
TTT: Well, you could distribute generating autoincrement ids over multiple machines by providing backup servers, but they all need to be synchronized with (ie, wait for) each other. So performance-wise it's not scalable, but it can be made more robust.
wump
@wump, you only need two to prevent a single point of failure. Not tens or hunders of machines. You can have two synced servers whose only job is generating (and returning) auto ids.
TTT
If ids don't need to be strictly sequential, it is easy to generate ids on multiple servers - simply increment by the maximum number of servers rather then by 1, so with two servers one would return even ids, the other odd ids.
Tom Clarkson
Tom: Indeed, that's similar to how it works with UUIDs and other object IDs such as used with MongoDB. They allocate part of the space per-server. Except that you're burning much less fast through 96/128 bits than 32.
wump
+2  A: 

Don't know about Cassandra, but with mongo you can have an atomic sequence (it won't scale, but will work the way it should, even in sharded environment if the query has the sharded field).

It can be done by using the findandmodify command.

Let's consider we have a special collection named sequences and we want to have a sequence for post numbers (named postid), you could use code similar to this:

> db.runCommand( { "findandmodify" : "sequences",
                   "query" : { "name" : "postid"},
                   "update" : { $inc : { "id" : 1 }},
                   "new" : true } );

This command will return atomically the updated (new) document together with status. The value field contains the returned document if the command completed successfully.

Hubert Kario