views:

30

answers:

3

Hi there, I actually don't even know how to call this :P, but...

I have one table, let's call it "uploads"

id    owner    date
-----------------------------
0     foo      20100101120000
1     bar      20100101120300
2     foo      20100101120400
3     bar      20100101120600
..    ..       ..
6     foo      20100101120800

Now, when I'ld do something like:

SELECT id FROM uploads ORDER BY date DESC

This would result in:

id    owner    date
-----------------------------
6     foo      20100101120800
..    ..       ..
3     bar      20100101120600
2     foo      20100101120400
1     bar      20100101120300
0     foo      20100101120000

Question: Nice, but, I want to go even further. Because now, when you would build a timeline (and I did :P), you are 'spammed' by messages saying foo and bar uploaded something. I'ld like to group them and return the first result with a time-limit of '500' at the date-field.

What kind of SQL-command do I need that would result in:

id    owner    date
-----------------------------
6     foo      20100101120800
3     bar      20100101120600
0     foo      20100101120000

Then, after that, I can perform a call for each record to get the associative records in a timeframe of 5 minutes (this is an exmaple for id=6):

SELECT id FROM uploads WHERE date>=20100101120800-500 ORDER BY date DESC

Does anyone now how I should do the first step? (so limiting/grouping the results)

(btw. I know that when I want to use this, I should convert every date (YmdHis=60) to Unix-time (=100), but I don't need the 5 minutes to be exactly 5 minutes, they may be a minute less sometimes...)

+1  A: 

Standard SQL doesn't deal with intervals very well. You are going to need to do a self-join of the table to compare dates of different tuples. That way, you can easily find all pairs of tuples of which the dates are no more than 500 apart. However, you really want to cluster the dates in sets no more than 500 apart - and that can't be expressed in SQL at all, as far as I know.

What you can do is something quite similar: split the total time interval into fixed 500-unit ranges, and then cluster all tuples in the table based on the interval they're in. For that, you first need a table or query result with the start times of the intervals; this can be created using a SQL query on your table and a function that either "rounds off" a timestamp to the starting time in its interval, or computes its interval sequence number. Then as a second step you can join the table with that result to group its timestamps according to their corresponding start time. I can't give the SQL because it's DBMS-dependent, and I certainly can't tell you if this is the best way of accomplishing what you want in your situation.

reinierpost
+1  A: 

Use an inline view? e.g. something like

SELECT u1.* 
FROM uploads u1,
(SELECT date 
    FROM uploads u2
    WHERE u2.owner='foo') datum_points
WHERE u1.date BETWEEN datum_points.date
    AND DATE_ADD(datum_points.date INTERVAL 5 MINUTES)

should return all the posts made within 5 minutes of 'foo' making a post.

symcbean
+1  A: 

I'm not quite clear on the result you are trying to get, even with your examples. Perhaps something with rounding and group by.

SELECT max(id) max_id,owner, (ROUND(date/500)*500) date_interval, max(date) date
FROM uploads GROUP BY date_interval,owner

You may want to use FLOOR or CEILING instead of ROUND, depending on what you want.

Brent Baisley
This is exactly what I meant. I'm now using "SELECT id FROM uploads GROUP BY ROUND(UNIX_TIMESTAMP(date)/3600), owner". I'm using 3600 since I think 1 hour is better than 5 minutes after I tried both. After Reinierpost mentioned "interval", I started googling to "MySQL Group By Interval" and I found this: http://forum.percona.com/index.php/t/748/, which was very helpful!Anyway, very much thanks to Reinierpost, Symcbean and Brent!
Tim van Elsloo