



I am going to be graphing netflow data stored in a MySQL database, and I need an efficient way to get the relevant data points. They records are stored with the date as an int for seconds since epoch. I Would like to be able to something like:

Select SUM(bytes) from table where stime > x and stime < Y
group by (10 second intervals)

Is there anyway to do this? or, would it be faster to handle it locally in python? even for a 500K row table?

Thanks! Chance

EDIT My Mistake, the time is stored as an unsigned double instead of an INT. I'm currently using GROUP BY (FLOOR(stime / I)) where I is the desired interval.


Have you tried the following? Just devide the tyiem column by 10 and round the result down.

SELECT    SUM(bytes) 
FROM      table 
WHERE     stime > x 
AND       stime < Y
GROUP BY  ROUND(stime/10, -1)

I don't know wether the ROUND() function and grouping with function calls works in MySQL though, the above is T-SQL.

Maximilian Mayerl
round is giving me very variable intervals, over a ten minute period I'm getting some intervals as small as 7secs, and some as large as 1 min...

You may be able to do this using integer division. Not sure of the performance.

Let I be your desired interval in seconds.

SELECT SUM(bytes), ((stime - X) DIV I) as interval
FROM table
WHERE (stime > X) and (stime < Y)
GROUP BY interval

Example, let X = 1500 and I = 10
stime = 1503 -> (1503 - 1500) DIV 10 = 0 
stime = 1507 -> (1507 - 1500) DIV 10 = 0
stime = 1514 -> (1514 - 1500) DIV 10 = 1
stime = 1523 -> (1523 - 1500) DIV 10 = 2
Lawrence Barsanti

I used suggestions from both answers and a coworker. End result is as follows:

Select FROM_UNIXTIME(stime), bytes 
from argusTable_2009_10_22 
where stime > (UNIX_TIMESTAMP()-600)
group by floor(stime /10)

I tried the rounding solution as well, but the results were inconsistent.



FLOOR in group by sometimes fails. it sometimes groups different times as one value for example when you divide the value with 3 but it doesn't do the same when you divide with 4, although the difference between these two values is far bigger than 3 or 4 which it should group as two different groups. Better cast it to unsigned after floor which works like:


The problem:

Sometimes GROUP BY FLOOR(UNIX_TIMESTAMP(time_field)/3) gives less groups compared to GROUP BY FLOOR(UNIX_TIMESTAMP(time_field)/4) which is mathematically shouldn't be possible.

It is mathematically very well possible. Say the values are "3" and "4", then divided by 3 both give 1, while divided by 4 they give 0 and 1. So grouping by /4 will give more groups in this case.

Hi, I did this a few time ago, so i created some function (with sql server, but i assume it's nearly the same) :

First I created a scalar function that return me the ID of a date depending on an interval and a date part (minute,hour,day,moth,year):

    @date datetime,
    @part nvarchar(10),
    @intervalle int
    -- Declare the return variable here
    DECLARE @res int
    DECLARE @date_base datetime
    SET @date_base = convert(datetime,'01/01/1970',103)

    set @res = case @part 
                WHEN 'minute' THEN datediff(minute,@date_base,@date)/@intervalle
                WHEN 'hour' THEN datediff(hour,@date_base,@date)/@intervalle
                WHEN 'day' THEN datediff(day,@date_base,@date)/@intervalle
                WHEN 'month' THEN datediff(month,@date_base,@date)/@intervalle
                WHEN 'year' THEN datediff(year,@date_base,@date)/@intervalle
                ELSE datediff(minute,@date_base,@date)/@intervalle END

    -- Return the result of the function
    RETURN @res


Then I created a table function that returns me all the id betweend a date range :

CREATE FUNCTION [dbo].[GetTableDate] 
    -- Add the parameters for the function here
    @start_date datetime, 
    @end_date datetime,
    @interval int,
    @unite varchar(10)
RETURNS @res TABLE (StartDate datetime,TxtStartDate nvarchar(50),EndDate datetime,TxtEndDate nvarchar(50),IdDate int)
    declare @current_date datetime 
    declare @end_date_courante datetime
    declare @txt_start_date nvarchar(50)
    declare @txt_end_date nvarchar(50)
    set @current_date = case @unite 
                WHEN 'minute' THEN dateadd(minute, datediff(minute,0,@start_date),0)
                WHEN 'hour' THEN dateadd(hour, datediff(hour,0,@start_date),0)
                WHEN 'day' THEN dateadd(day, datediff(day,0,@start_date),0)
                WHEN 'month' THEN dateadd(month, datediff(month,0,@start_date),0)
                WHEN 'year' THEN dateadd(year, datediff(year,0,dateadd(year,@interval,@start_date)),0)
                ELSE dateadd(minute, datediff(minute,0,@start_date),0) END

    while @current_date < @end_date
        set @end_date_courante = 
            case @unite 
                WHEN 'minute' THEN dateadd(minute, datediff(minute,0,dateadd(minute,@interval,@current_date)),0)
                WHEN 'hour' THEN dateadd(hour, datediff(hour,0,dateadd(hour,@interval,@current_date)),0)
                WHEN 'day' THEN dateadd(day, datediff(day,0,dateadd(day,@interval,@current_date)),0)
                WHEN 'month' THEN dateadd(month, datediff(month,0,dateadd(month,@interval,@current_date)),0)
                WHEN 'year' THEN dateadd(year, datediff(year,0,dateadd(year,@interval,@current_date)),0)
                ELSE dateadd(minute, datediff(minute,0,dateadd(minute,@interval,@current_date)),0) END
        SET @txt_start_date = case @unite 
                WHEN 'minute' THEN CONVERT(VARCHAR(20), @current_date, 100)
                WHEN 'hour' THEN CONVERT(VARCHAR(20), @current_date, 100)
                WHEN 'day' THEN REPLACE(CONVERT(VARCHAR(11), @current_date, 106), ' ', '-')
                WHEN 'month' THEN REPLACE(RIGHT(CONVERT(VARCHAR(11), @current_date, 106), 8), ' ', '-')
                WHEN 'year' THEN CONVERT(VARCHAR(20), datepart(year,@current_date))
                ELSE CONVERT(VARCHAR(20), @current_date, 100) END
        SET @txt_end_date = case @unite 
                WHEN 'minute' THEN CONVERT(VARCHAR(20), @end_date_courante, 100)
                WHEN 'hour' THEN CONVERT(VARCHAR(20), @end_date_courante, 100)
                WHEN 'day' THEN REPLACE(CONVERT(VARCHAR(11), @end_date_courante, 106), ' ', '-')
                WHEN 'month' THEN REPLACE(RIGHT(CONVERT(VARCHAR(11), @end_date_courante, 106), 8), ' ', '-')
                WHEN 'year' THEN CONVERT(VARCHAR(20), datepart(year,@end_date_courante))
                ELSE CONVERT(VARCHAR(20), @end_date_courante, 100) END
        INSERT INTO @res (
IdDate) values(
        set @current_date = @end_date_courante


So if I want to count all the user added for each interval of 33 minutes :

SELECT count(id_user) , timeTable.StartDate
FROM user
INNER JOIn dbo.[GetTableDate]('1970-01-01',datedate(),33,'minute') as timeTable
ON dbo.getIDDate(user.creation_date,'minute',33) = timeTable.IDDate

GROUP BY dbo.getIDDate(user.creation_date,'minute',33) ORDER BY timeTable.StartDate


remi bourgarel