ansaurus

Question

Answer 1

+1 A:

Something like this came to my mind

select @rownum:=@rownum+1 rownum, entries.* 
from (select @rownum:=0) r, entries 
where uid = ? and rownum % 150 = 0

I don't have MySQL at my hand but maybe this will help ...

Michal Sznajder 2008-08-06 17:06:44

Answer 2

A:

@Michal

For whatever reason, your example only works when the where @recnum uses a less than operator. I think when the where filters out a row, the rownum doesn't get incremented, and it can't match anything else.

If the original table has an auto incremented id column, and rows were inserted in chronological order, then this should work:

select timefield from entries
where uid = ? and id % 150 = 0 order by timefield;

Of course that doesn't work if there is no correlation between the id and the timefield, unless you don't actually care about getting evenly spaced timefields, just 20 random ones.

Ryan Ahearn 2008-08-06 18:01:32

Answer 3

A:

@Ryan

The rows are not in chronological order. The timefield is mutable - think of it as "last_update".

Michiel de Mare 2008-08-06 18:41:07

Answer 4

A:

Do you really care about the individual data points? Or will using the statistical aggregate functions on the day number instead suffice to tell you what you wish to know?

Scott Noyes 2008-08-27 16:14:36

Answer 5

A:

select
    timefield
from
    entries
where
    rand() = .01 --will return 1% of rows adjust as needed.

--not a mysql expert so I'm not sure how rand() operates in this environment.

jms 2008-08-27 16:37:10

that should be "rand() < .01"

nickf 2008-10-01 02:08:29

Answer 6

+2 A:

Michal Sznajder almost had it, but you can't use column aliases in a WHERE clause in SQL. So you have to wrap it as a derived table. I tried this and it returns 20 rows:

SELECT * FROM (
    SELECT @rownum:=@rownum+1 AS rownum, e.*
    FROM (SELECT @rownum := 0) r, entries e) AS e2
WHERE uid = ? AND rownum % 150 = 0;

Bill Karwin 2008-10-01 01:49:27

Answer 7

+1 A:

As far as visualization, I know this is not the periodic sampling you are talking about, but I would look at all the rows for a user and choose an interval bucket, SUM within the buckets and show on a bar graph or similar. This would show a real "distribution", since many occurrences within a time frame may be significant.

SELECT DATEADD(day, DATEDIFF(day, 0, timefield), 0) AS bucket -- choose an appropriate granularity (days used here)
     ,COUNT(*)
FROM entries
WHERE uid = ?
GROUP BY DATEADD(day, DATEDIFF(day, 0, timefield), 0)
ORDER BY DATEADD(day, DATEDIFF(day, 0, timefield), 0)

Or if you don't like the way you have to repeat yourself - or if you are playing with different buckets and want to analyze across many users in 3-D (measure in Z against x, y uid, bucket):

SELECT uid
    ,bucket
    ,COUNT(*) AS measure
FROM (
    SELECT uid
        ,DATEADD(day, DATEDIFF(day, 0, timefield), 0) AS bucket
    FROM entries
) AS buckets
GROUP BY uid
    ,bucket
ORDER BY uid
    ,bucket

If I wanted to plot in 3-D, I would probably determine a way to order users according to some meaningful overall metric for the user.

Cade Roux 2008-10-01 02:07:33

can you do "GROUP BY bucket ORDER BY bucket"? that seems as though it would be much more efficient (not having to recalculate that column each time)

nickf 2008-10-01 03:12:24

No, you cannot, however, the optimizer does not actually re-calculate those expressions, because it knows that the functions are deterministic.

Cade Roux 2008-10-01 04:31:44

ansaurus

tags:

views:

answers:

SQL: Distribution of table in time

related questions