views:

77

answers:

7

I have a database containing records collected every 0.1 seconds, and I need to time-average the data from a given day to once every 20 minutes. So I need to return a day's worth of data averaged to every 20 minutes which is 24*3 values.
Currently I do a separate AVG call to the database for each 20-minute period within the day, which is 24*3 calls. My connection to the database seems a little slow (it is remote) and it takes ~5 minutes to do all the averages. Would it be faster to do a single query in which I access the entire day's worth of data then average it to every 20 minutes? If it helps to answer the question, I have to do some arithmetic to the data before averaging, namely multiplying several table columns.

+1  A: 

In general reducing the number of queries is a good idea. Aggregate and do whatever arithmetic/filtering/grouping you can in the query (i.e. in the database), and then do 'iterative' computations on the server side (e.g. in PHP).

vicatcu
+1  A: 

You can calculate the number of minutes since midnight like:

datepart(hh,datecolumn)*60 + datepart(mi,datecolumn)

If you divide that by 20, you get the number of the 20 minute interval. For example, 00:10 would fall in interval 0, 00:30 in interval 1, and 15:30 in interval 46, and so on. With this formula, you can group on 20 minute intervals like:

select
    (datepart(hh,datecolumn)*60 + datepart(mi,datecolumn)) / 20 as IntervalNr
,   avg(value)
from      YourTable
group by  (datepart(hh,datecolumn)*60 + datepart(mi,datecolumn)) / 20

You can do math inside the avg call, like:

avg(col1 * col2 - col3 / col4)
Andomar
A: 

To be sure whether it would be faster or not, it should be measured.

However it should be faster, as you have a slow connection to the database, and this way the number of roundtrips has a bigger impact on the total time of execution.

treaschf
A: 

How about a stored procedure on your database? If your database engine doesn't support one, how about having a script or something doing the math and populating a separate 'averages' table on your database server. Then you only have to read the averages from the remote client once a day only.

Dave.Sol
A: 

Computation in one single query would be slightly faster. Think of the overhead on multiple requests like setting up the connection, parsing the query or loading the stored procedure, etc.

But also make sure that you've accurate indicies which may result in a hugh performance increase. Some operations on hugh databases may last from minutes to hours.

Scoregraphic
A: 

If you are sending a lot of data, and the connection is the bottleneck, how and when you group and send the data doesn't matter. There is no good way to send 100MB every 10 minutes over a 56k modem. Figure out the size of your data and bandwidth and be sure you can even send it.

That said:

First be certain the network is the bottleneck. If so, try to work with a smaller data set if possible, and test different scenarios. In general, 1 large record set will use less bandwidth than 2 recordsets that are half the size.

Brad
A: 

If possible add columns to your table and compute and store the column product and interval index (see Andomar's post) every time you post data to the database.

tpowe