views:

601

answers:

1

We are looking at using Cassandra to store a stream of information coming from various sources.

One issue we are facing is the best way to query between two dates.

For example we will need to retrieve an object between datetime dt1 and datetime dt2.

We are currently considering the created unix timestamp as the key pointing to the actual object then using get_key_range to query to retrieve?

Obviously this wouldn't work if two items have the same timestamp.

Is this the best way to do datetime in noSQL stores in general?

+6  A: 

Cassandra rows can be very large, so consider modeling it as columns in a row rather than rows in a CF; then you can use the column slice operations, which are faster than row slices. If there are no "natural" keys associated with this then you can use daily or hourly keys like "2010/02/08 13:00".

Otherwise, yes, using range queries (get_key_range is deprecated in 0.5; use get_range_slice) is your best option.

jbellis
How large is very large? On slide 41 of the presentation at http://www.slideshare.net/jbellis/cassandra-open-source-bigtable-dynamo you say "Millions of columns per row" for 0.5. Is columns in a row still the way to go for really big time series?
Adam Hollidge
Yes, columns are the way to go.
z8000