Hi,
I'm thinking about using Cassandra for a large data project. The data will be sourced from a traditional data warehouse. Cassandra will host the data formated in a way my application can correctly read it.
I don't quite understand how I will prune the data from Cassandra.
For example, I want to count the number of visits a particular ip address has made to a website in the past 24 hours. I plan on generating this data every hour and I'd like to keep 2 weeks per IP address. My Column structure looks like:
127.0.0.1: {
visitorsLast24Hours: {
1279554672: 30,
1279553072: 24,
etc...
}
}
How do I remove rows from the visitorsLast24Hours column?
So far, the best solution I've come up with is to:
- Get the column I want to work with
- Prune the values I no longer want to keep
- Delete the column from the database
- Re-insert the new pruned column
This seems like a poor method for working with the database. I'm assuming my data sizes will balloon, based on the way storage is done in Cassandra.
Is there a more efficient way of doing it?
I'm currently working with phpcassa as my interface to Cassandra.
Thanks!