views:

397

answers:

4

Say I have "user". It's the key. And I need to keep "user count". I am planning to have record with key "user" and value "0" to "9999+ ;-)" (as many as I'll have).

What problems I will drive in if I use Cassandra, HBase or MySQL for that? Say, I have thousand of new updates to this "user" key, where I need to increment the value. Am I in trouble? Locked for writes? Any other way of doing that?

Why this is done -- there will be a lot of "user"-like keys. Different other cases. But the idea is the same. Why keep it this way -- because I'll have more reads, so I can always get "counted value" very fast.

+3  A: 

I would just update the user count as a batch operation every N minutes rather than updating it in realtime. If there's only one process updating it, you don't need to worry about contention by definition.

Alternatively cassandra has a contrib/mutex for adding lock support via ZooKeeper.

jbellis
Right, but do I need to count all users every N minutes? Sounds expensive. My other idea is to have it done through cache, where everything will be in cache, and it'll do the updates every N minutes.. though, not sure if there is still a better way of doing that.
alexeypro
+1  A: 

MongoDB has update-in-place and a special inc operator for counters. http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics

TTT
A: 

HBase has an incrementColumnValue method for a fast, atomic read/write operation.

Dave L.
+1  A: 

MongoDB and HBase have this built-in (as do most other databases which guarantee consistency).

One fairly simple trick with Cassandra is to have a particular row for usercount, and then insert an unique ID (e.g. a random UUID) column name with an empty value to it every single time an user is added. With regular intervals count the number of columns and put them in a total counter - removing the columns you've just counted.

At any time, your total user count is therefore [total counter]+[number of columns on your usercount row]. You can get these with essentially two reads, and if you have row cache enabled, it's going to be fast.

Janne Jalkanen
+1 Nice and clean.