tags:

views:

563

answers:

3

I'm setting up a simple SQLite database to hold sensor readings. The tables will look something like this:

sensors  
 - id (pk) 
 - name  
 - description
 - units  

sensor_readings  
 - id (pk)  
 - sensor_id (fk to sensors)  
 - value (actual sensor value stored here)
 - time (date/time the sensor sample was taken)

The application will be capturing about 100,000 sensor readings per month from about 30 different sensors, and I'd like to keep all sensor readings in the DB as long as possible.

Most queries will be in the form

SELECT * FROM sensor_readings WHERE sensor_id = x AND time > y AND time < z

This query will usually return about 100-1000 results.

So the question is, how big can the sensor_readings table get before the above query becomes too time consuming (more than a couple seconds on a standard PC).

I know that one fix might be to create a separate sensor_readings table for each sensor, but I'd like to avoid this if it is unnecessary. Are there any other ways to optimize this DB schema?

+1  A: 

Are you setting indexes properly? Besides that and reading http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html, the only answer is 'you'll have to measure yourself' - especially since this will be heavily dependent on the hardware and on whether you're using an in-memory database or on disk, and on if you wrap inserts in transactions or not.

That being said, I've hit noticeable delays after a couple of tens of thousands of rows, but that was absolutely non-optimized - from reading a bit I get the impression that there are people with 100's of thousands of rows with proper indexes etc. who have no problems at all.

Roel
+2  A: 

If you're going to be using time in the queries, it's worthwhile adding an index to it. That would be the only optimization I would suggest based on your information.

100,000 insertions per month equates to about 2.3 per minute so another index won't be too onerous and it will speed up your queries. I'm assuming that's 100,000 insertions across all 30 sensors, not 100,000 for each sensor but, even if I'm mistaken, 70 insertions per minute should still be okay.

If performance does become an issue, you have the option to offload older data to a historical table (say, sensor_readings_old) and only do your queries on the non-historical table (sensor_readings).

Then you at least have all the data available without affecting the normal queries. If you really want to get at the older data, you can do so but you'll be aware that the queries for that may take a while longer.

paxdiablo
A: 

SQLite now supports R-tree indexes ( http://www.sqlite.org/rtree.html ), ideal if you intend to do a lot of time range queries.

Tom

gobansaor