We have a table that contains website page views, like:
time | page_id
----------|-----------------------------
1256645862| pageA
1256645889| pageB
1256647199| pageA
1256647198| pageA
1256647300| pageB
1257863235| pageA
1257863236| pageC
In our production table, there is currently about 40K rows. We want to generate, for each day, the count of unique pages viewed in the last 30 days, 60 days, and 90 days. So, in the result set, we can look-up a day, and see how many unique pages were accessed within the 60-day period preceding that day.
We were able to get a query to work in MSSQL:
SELECT DISTINCT
CONVERT(VARCHAR,P.NDATE,101) AS 'DATE',
(SELECT COUNT(DISTINCT SUB.PAGE_ID) FROM (SELECT PAGE_ID FROM perflog WHERE NDATE BETWEEN DATEADD(D,-29,P.NDATE) AND P.NDATE) AS SUB) AS '30D',
(SELECT COUNT(DISTINCT SUB.PAGE_ID) FROM (SELECT PAGE_ID FROM perflog WHERE NDATE BETWEEN DATEADD(D,-59,P.NDATE) AND P.NDATE) AS SUB) AS '60D',
(SELECT COUNT(DISTINCT SUB.PAGE_ID) FROM (SELECT PAGE_ID FROM perflog WHERE NDATE BETWEEN DATEADD(D,-89,P.NDATE) AND P.NDATE) AS SUB) AS '90D'
FROM PERFLOG P
ORDER BY 'DATE'
NOTE: because MSSQL doesn't have the FROM_UNIXTIME function, we added the NDATE column for testing, which is just the converted time
. NDATE does not exist in the production table.
Converting this query to MySQL gives us the "Unknown colum P.time" error:
SELECT DISTINCT
FROM_UNIXTIME(P.time,'%Y-%m-%d') AS 'DATE',
(SELECT COUNT(DISTINCT SUB.PAGE_ID) FROM (SELECT PAGE_ID FROM perflog WHERE FROM_UNIXTIME(time,'%Y-%m-%d') BETWEEN DATE_SUB(FROM_UNIXTIME(P.time,'%Y-%m-%d'), INTERVAL 30 DAY) AND FROM_UNIXTIME(P.time,'%Y-%m-%d')) AS SUB) AS '30D',
(SELECT COUNT(DISTINCT SUB.PAGE_ID) FROM (SELECT PAGE_ID FROM perflog WHERE FROM_UNIXTIME(time,'%Y-%m-%d') BETWEEN DATE_SUB(FROM_UNIXTIME(P.time,'%Y-%m-%d'), INTERVAL 60 DAY) AND FROM_UNIXTIME(P.time,'%Y-%m-%d')) AS SUB) AS '60D',
(SELECT COUNT(DISTINCT SUB.PAGE_ID) FROM (SELECT PAGE_ID FROM perflog WHERE FROM_UNIXTIME(time,'%Y-%m-%d') BETWEEN DATE_SUB(FROM_UNIXTIME(P.time,'%Y-%m-%d'), INTERVAL 90 DAY) AND FROM_UNIXTIME(P.time,'%Y-%m-%d')) AS SUB) AS '90D'
FROM PERFLOG P
ORDER BY 'DATE'
I understand this is because we cannot have a correlated subquery that references a table in the outer FROM clause. But, unfortunately, we are at a loss on how to convert this query to work in MySQL. For now, we simply return all DISTINCT rows from the table and post-process it in PHP. Takes about 2-3 seconds for 40K rows. I'm worried about the performance when we have 100's of 1000's of rows.
Is it possible to do in MySQL? If so, can we expect it to perform better than our PHP post-processed solution.
UPDATE: Here's the query for creating the table:
CREATE TABLE `perflog` (
`user_id` VARBINARY( 40 ) NOT NULL ,
`elapsed` float UNSIGNED NOT NULL ,
`page_id` VARCHAR( 255 ) NOT NULL ,
`time` INT( 10 ) UNSIGNED NOT NULL ,
`ip` VARBINARY( 40 ) NOT NULL ,
`agent` VARCHAR( 255 ) NOT NULL ,
PRIMARY KEY ( `user_id` , `page_id` , `time` , `ip`, `agent` )
) ENGINE MyISAM
Our production table has ~40K rows thus far.