views:

1397

answers:

4

Is there a way to get a row count (key count) of a single column family in Cassandra? get_count can only be used to get the column count.

For instance, if I have a column family containing users and wanted to get the number of users. How could I do it? Each user is it's own row.

A: 

I have been getting the counts like this after I convert the data into a hash in PHP.

Philip Schlump
That clearly doesn't scale, as at some point the hash won't fit (usefully) into PHP's ram any more. Cassandra is for scalable stuff.
MarkR
I know - that is what concerns me. I am still in the development stage and looking for a better solution to problems like this.
Philip Schlump
+2  A: 

If you are using an order-preserving partitioner, you can do this with get_range_slice or get_key_range.

If you are not, you will need to store your user ids in a special row.

jbellis
What if I have millions of rows? get_range_slice feels kind of heavy for a simple count operation. Am I right?
Henri Liljeroos
You are right, counting raw objects is relatively expensive in distributed systems compared to what you are used to on a system that can just keep an index in local memory.
jbellis
A: 

Did you find any solution on how to count column family without using hash in php?

I'm not using PHP but I ended up using a special row for storing ids as jbellis suggested. Seems to work fine.
Henri Liljeroos
http://wiki.apache.org/cassandra/API06 says that is not terribly scalable: "The method is not O(1). It takes all the columns from disk to calculate the answer. The only benefit of the method is that you do not need to pull all the columns over Thrift interface to count them."
dfrankow
"that is not terribly scalable" => "get_count is not terribly scalable"
dfrankow
+1  A: 

I'm new to Cassandra, but I have messed around a lot with Google's App Engine. If no other solution presents itself, you may consider keeping a separate counter in a platform that supports atomic increment operations like memcached. I know that Cassandra is working on atomic counter increment/decrement functionality, but it's not yet ready for prime time.

I can only post one hyperlink because I'm new, so for progress on counter support see the link in my comment below.

Note that this thread suggests ZooKeeper, memcached, and redis as possible solutions. My personal preference would be memcached.

http://www.mail-archive.com/[email protected]/msg03965.html

Ben Burns
https://issues.apache.org/jira/browse/CASSANDRA-1072 for progress on Cassandra counter support.
Ben Burns