views:

559

answers:

2

I've been using SQL Server to store historical time series data for a couple hundred thousand objects, observed about 100 times per day. I'm finding that queries (give me all values for object XYZ between time t1 and time t2) are too slow (for my needs, slow is more then a second). I'm indexing by timestamp and object ID.

I've entertained the thought of using somethings a key-value store like MongoDB instead, but I'm not sure if this is an "appropriate" use of this sort of thing, and I couldn't find any mentions of using such a database for time series data. ideally, I'd be able to do the following queries:

  • retrieve all the data for object XYZ between time t1 and time t2
  • do the above, but return one date point per day (first, last, closed to time t...)
  • retrieve all data for all objects for a particular timestamp

the data should be ordered, and ideally it should be fast to write new data as well as update existing data.

it seems like my desire to query by object ID as well as by timestamp might necessitate having two copies of the database indexed in different ways to get optimal performance...anyone have any experience building a system like this, with a key-value store, or HDF5, or something else? or is this totally doable in SQL Server and I'm just not doing it right?

+1  A: 

This is why databases specific to time series data exist - relational databases simply aren't fast enough for large time series.

I've used Fame quite a lot at investment banks. It's very fast but I imagine very expensive. However if your application requires the speed it might be worth looking it.

Bruce Blackshaw
+2  A: 

It sounds like MongoDB would be a very good fit. Updates and inserts are super fast, so you might want to create a document for every event, such as:

{
   object: XYZ,
   ts : new Date()
}

Then you can index the ts field and queries will also be fast. (By the way, you can create multiple indexes on a single database.)

How to do your three queries:

retrieve all the data for object XYZ between time t1 and time t2

db.data.find({object : XYZ, ts : {$gt : t1, $lt : t2}})

do the above, but return one date point per day (first, last, closed to time t...)

// first
db.data.find({object : XYZ, ts : {$gt : new Date(/* start of day */)}}).sort({ts : 1}).limit(1)
// last
db.data.find({object : XYZ, ts : {$lt : new Date(/* end of day */)}}).sort({ts : -1}).limit(1)

For closest to some time, you'd probably need a custom JavaScript function, but it's doable.

retrieve all data for all objects for a particular timestamp

db.data.find({ts : timestamp})

Feel free to ask on the user list if you have any questions, someone else might be able to think of an easier way of getting closest-to-a-time events.

kristina