views:

43

answers:

2

I'm building an application that requires storage and querying of very large amount of timestamped data. Specifically, this is analytics type data whereby a particular "set" can contain up to 100 separate data points tied together via a shared timestamp. This data is static, in that it will never change once stored but may be queried multiple times before it is expunged from the database (we currently only store 12 weeks of data).

At the moment, we're storing this data within MySQL with several tables of around 100 million rows each (we add roughly 3 million rows daily). This is fine when storing and querying the data only by timestamp, but it becomes tedious when performing SELECTs with sorts and especially when trying to remove old stale data from the tables. A DELETE query using for example WHERE timestamp < $12weeksago often takes several hours to complete.

We would also like to keep some of the data beyond the 12 week period, which adds another WHERE clause to the DELETE against another column. This slows things up further.

What storage engine would you recommend based on this use case, bearing in mind the high volume of reads/writes and querying needs?

A: 

It all depends on the kind of selects that you do. If a NoSQL database can do it than that might be your best guess.

However, I think you can probably improve your performance a lot by partitioning the database. Since it's already time based you can easily create a partition per day/week/month, that way you only have to query the tables that are relevant for your current query. And deleting old data is a simple drop table instead of a slow delete.

WoLpH
A: 

Try Redis or MongoDB. They were both designed for this sort of use case.

duluthian