tags:

views:

922

answers:

5

I have a database which gets updated with 200-1000 new rows per day. Now, I'd like to have an SQL-statement which returns the data day-by-day, hour-by-hour so I can give a rough estimate for the current trend, i.e. how many rows will be added to the database today, just by taking a quick look at those historical graphs.

So, say that I would like to have 10 graphs printed out for the last 10 days, with the data summed up for every hour, like:

Day9:21,24,15,18,...,30,28,25 : tot 348 (number of rows per hour for day 9 and the total) Day8:32,37,38,43,...,45,55,65 : tot 442 (number of rows per hour for day 8 and the total) ... ... Day0:18,25,28,X,Y... : tot 'S' (stats for today so far. What will S be?)

How would the SQL-statement look like to collect the data in this day-by-day, hour-for-hour fashion?

Instead of looking at the graps visually in order to give a rough estimate of today's total 'S', even better would be to compute a prediction of 'S'. But that would be a completely other problem I guess... Any tips for how to do that or hints where I can get more information would be much appreciated!

Thanks, /Tommy

+2  A: 

Hum, depending on your database engine, you'll get different results, but with PostgreSQL, I would do something like that :

SELECT date_trunc('hour', table.date), count(table.id)
FROM table
GROUP BY date_trunc('hour', table.date)
ORDER BY date_trunc('hour', table.date)

The date_trunc function truncates a timestamp field up to a certain point. That query would return you hour by hour, the number of queries, you would just have to do the sums in your software.

If you really mean to have a SQL query returning exactly what you want, I think you'll have to make a function returning a sql set with the proper data, but I think it's easier to do it in your code.

mat
A: 

Ok, I'm on MySQL and it seems that date_trunc is not available there :-(

I've tried to write something to replace date_truncg but it does not work. Any hints for what is wrong would be much appreciated!

Logically I think the following statement is equivalent with date_trunc('hour', datecreation): SELECT DATE_ADD(DATE_FORMAT(datecreation, '%Y-%m-%d 00:00:00'), EXTRACT(HOUR FROM datecreation)) AS MYHOUR but obviously it is not since it does not work... How should it be done?

date_format can be used in a pinch.
derobert
A: 

MySQL has a bunch of date/time functions... you might be looking for HOUR(date) as an equivalent to date_trunc('hour', date) in PostGreSQL.

So, if you wanted by Day and by Hour...

SELECT Day(theDate), Hour(theDate), COUNT(1)
FROM theTable
WHERE ....
GROUP BY Day(theDate), Hour(theHour)
ORDER BY Day(theDate), Hour(theHour)

It would give you rows like this:

Day,Hour,Count
1,0,102
1,1,133
...
10,22,47
10,23,384
bobwienholt
A: 

Great thanks Bob! Works like a charm :-) /Tommy

A: 

I had a similar situation, using Oracle. With a table named reporting_data, I wanted a query that could tell me how many records had been inserted per hour, and how many had been inserted in 10 minute increments.

Per hour was easy:

    SELECT TO_CHAR(TRUNC(r.creation_date, 'HH'), 'DD-MON-YYYY HH24:MI:SS'),
         COUNT (*)
    FROM reporting_data r
   WHERE r.creation_date > TO_DATE ('27-OCT-2008', 'dd - mon - yyyy')
     AND r.creation_date < TO_DATE ('28-OCT-2008', 'dd - mon - yyyy')
GROUP BY TO_CHAR (TRUNC (r.creation_date, 'HH'), 'DD-MON-YYYY HH24:MI:SS')
ORDER BY TO_CHAR (TRUNC (r.creation_date, 'HH'), 'DD-MON-YYYY HH24:MI:SS') ASC

That query would return counts all of the records between Oct 27 and Oct 28, broken down per hour, based on the creation_date column.

Breaking it down in 10 minute increments, instead of hourly increments, was a bit harder, but with some manipulation it was doable.

SELECT   SUBSTR(TO_CHAR(r.creation_date, 'DD-MON-YYYY HH24:MI:SS'), 1, 16) || '0:00',
         COUNT (*)
    FROM reporting_data r
   WHERE r.creation_date > TO_DATE ('27-OCT-2008', 'DD-MON-YYYY')
     AND r.creation_date < TO_DATE ('28-OCT-2008', 'DD-MON-YYYY')
GROUP BY SUBSTR (TO_CHAR (r.creation_date, 'DD-MON-YYYY HH24:MI:SS'), 1, 16) || '0:00'

There's a lot of string manipulation going on there, so it might not be the most performant way of doing it. On a table of over 25,000,000 rows, it took about a minute to execute. (Then again, just doing a SELECT COUNT(*) on the same table took about 30 seconds, too, so there may have been other issues aside from the query.)

sernaferna