views:

44

answers:

1

I have a web application that has a MySql database with a device_status table that looks something like this...

deviceid | ... various status cols ... | created 

This table gets inserted into many times a day (2000+ per device per day (estimated to have 100+ devices by the end of the year))

Basically this table gets a record when just about anything happens on the device.

My question is how should I deal with a table that is going to grow very large very quickly?

  1. Should I just relax and hope the database will be fine in a few months when this table has over 10 million rows? and then in a year when it has 100 million rows? This is the simplest, but seems like a table that large would have terrible performance.

  2. Should I just archive older data after some time period (a month, a week) and then make the web app query the live table for recent reports and query both the live and archive table for reports covering a larger time span.

  3. Should I have an hourly and/or daily aggregate table that sums up the various statuses for a device? If I do this, what's the best way to trigger the aggregation? Cron? DB Trigger? Also I would probably still need to archive.

There must be a more elegant solution to handling this type of data.

+1  A: 

I had a similar issue in tracking the number of views seen for advertisers on my site. Initially I was inserting a new row for each view, and as you predict here, that quickly led to the table growing unreasonably large (to the point that it was indeed causing performance issues which ultimately led to my hosting company shutting down the site for a few hours until I had addressed the issue).

The solution I went with is similar to your #3 solution. Instead of inserting a new record when a new view occurs, I update the existing record for the timeframe in question. In my case, I went with daily records for each ad. what timeframe to use for your app would depend entirely on the specifics of your data and your needs.

Unless you need to specifically track each occurrence over the last hour, you might be over-doing it to even store them and aggregate later. Instead of bothering with the cron job to perform regular aggregation, you could simply check for an entry with matching specs. If you find one, then you update a count field of the matching row instead of inserting a new row.

JGB146
Thanks for your answer, I was wondering... how was your performance since you are doing a lot of updates. I know that typically inserts are much faster.
delux247
Performance has been good thus far. It's only been in place for about a week now.
JGB146