views:

124

answers:

2

I need to write queries to find out new users and regular users.

new users are the ones whose uuid appeared in last 24 hours (from now minus the time query is fired) in table2 and was not there before.

regular users are the ones whose uuid appeared in last day in table2 and was also there at least once in the last 3 days.

In addition to this only records with id > 10 and ip != 2 are to be considered.

table1 is a temporary table containing dates. I am not able to figure out how to achieve this with help of joins. Please help me.


table2

    +----+---------------------+------+------+
    | id | ts                  | uuid | ip   |
    +----+---------------------+------+------+
    |  1 | 2010-01-10 00:00:00 | uid1 |    5 |
    |  2 | 2010-01-10 00:00:00 | uid2 |   14 |
    |  3 | 2010-01-10 00:00:00 | uid3 |   11 |
    |  4 | 2010-01-11 00:00:00 | uid4 |   16 |
    |  5 | 2010-01-11 00:00:00 | uid5 |    4 |
    |  6 | 2010-01-13 00:00:00 | uid6 |    2 |
    |  7 | 2010-01-10 00:00:00 | uid1 |    1 |
    |  8 | 2010-01-11 00:00:00 | uid2 |   10 |
    |  9 | 2010-01-12 00:00:00 | uid1 |    1 |
    | 10 | 2010-01-13 00:00:00 | uid4 |    1 |
    | 11 | 2010-01-09 21:00:00 | uid1 |    1 |
    | 12 | 2010-01-09 21:30:00 | uid1 |    2 |
    | 13 | 2010-01-10 05:00:00 | uid2 |    3 |
    | 14 | 2010-01-10 12:00:00 | uid1 |    1 |
    | 15 | 2010-01-10 12:00:00 | uid3 |    1 |
    | 16 | 2010-01-10 21:00:01 | uid1 |    7 |
    | 17 | 2010-01-11 01:00:00 | uid2 |   14 |
    | 18 | 2010-01-11 05:00:00 | uid2 |   11 |
    | 19 | 2010-01-11 17:59:00 | uid4 |   13 |
    | 20 | 2010-01-11 06:00:00 | uid5 |   12 |
    | 21 | 2010-01-11 18:01:00 | uid1 |   14 |
    | 22 | 2010-01-12 23:05:00 | uid4 |   17 |
    | 23 | 2010-01-13 12:01:23 | uid6 |   13 |
    +----+---------------------+------+------+
    23 rows in set (0.00 sec)

table1

    +------------+
    | ts         |
    +------------+
    | 2010-01-10 |
    | 2010-01-11 |
    | 2010-01-12 |
    | 2010-01-13 |
    +------------+
    4 rows in set (0.00 sec)

Output in case of new users taken at 18:00

+------------+-------+
| ts         | users |
+------------+-------+
| 2010-01-10 |     3 |
| 2010-01-11 |     2 |
| 2010-01-12 |     0 |
| 2010-01-13 |     1 |
+------------+-------+
4 rows in set (0.00 sec)

MySQL table dump

DROP TABLE IF EXISTS `table1`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `table1` (
  `ts` date NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

INSERT INTO `table1` VALUES ('2010-01-10'),('2010-01-11'),('2010-01-12'),('2010-01-13');

DROP TABLE IF EXISTS `table2`;
CREATE TABLE `table2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ts` datetime DEFAULT NULL,
  `uuid` varchar(20) DEFAULT NULL,
  `ip` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=24 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

INSERT INTO `table2` VALUES (1,'2010-01-10 00:00:00','uid1',5),(2,'2010-01-10 00:00:00','uid2',14),(3,'2010-01-10 00:00:00','uid3',11),(4,'2010-01-11 00:00:00','uid4',16),(5,'2010-01-11 00:00:00','uid5',4),(6,'2010-01-13 00:00:00','uid6',2),(7,'2010-01-10 00:00:00','uid1',1),(8,'2010-01-11 00:00:00','uid2',10),(9,'2010-01-12 00:00:00','uid1',1),(10,'2010-01-13 00:00:00','uid4',1),(11,'2010-01-09 21:00:00','uid1',1),(12,'2010-01-09 21:30:00','uid1',2),(13,'2010-01-10 05:00:00','uid2',3),(14,'2010-01-10 12:00:00','uid1',1),(15,'2010-01-10 12:00:00','uid3',1),(16,'2010-01-10 21:00:01','uid1',7),(17,'2010-01-11 01:00:00','uid2',14),(18,'2010-01-11 05:00:00','uid2',11),(19,'2010-01-11 17:59:00','uid4',13),(20,'2010-01-11 06:00:00','uid5',12),(21,'2010-01-11 18:01:00','uid1',14),(22,'2010-01-12 23:05:00','uid4',17),(23,'2010-01-13 12:01:23','uid6',13);
+2  A: 

You can join the table on itself to search for entries for the same user that are more than a day old. When there's no day-old match, fields in the left joined table will be NULL.

For example:

select     
  YEAR(cur.ts) as year
, MONTH(cur.ts) as month
, DAY(cur.ts) as day
, case when old.uuid is null then 1 else 0 end as IsNewUser
, count(distinct cur.uuid) as Users
from       table2 cur
left join  table2 old
on         cur.uuid = old.uuid
           and old.ip <> 2
           and old.id > 10
           and cur.ts - old.ts > 1
where      cur.ip <> 2
           and cur.id > 10
group by   year, month, day, IsNewUser
order by   year, month, day, IsNewUser
Andomar
Does this work on your setup? On mine, even if I change `timestamp` to `ts` (to match the original), I still get "ERROR 1054 (42S22): Unknown column 'old.uuid' in 'field list'".
T.J. Crowder
@T.J. Crowder: Typed it from my head, I switched table1 and table2. Edited with a better version, though the result is still different from the example answer in the question.
Andomar
yes this is working on my machine. But not able to figure out how I change it to my requirements :(
Amit
+1  A: 

I'm not all that familiar with MySQL, but here's how I'd do it in Oracle:

SELECT uuid, 'NEW' as user_type FROM
  (SELECT uuid, MAX(ts) as MAX_TS, MIN(ts) as MIN_TS
     FROM TABLE2
     WHERE ID > 10 AND
           IP <> 2
     GROUP BY uuid
     HAVING MAX_TS > SYSTIMESTAMP - INTERVAL '1' DAY AND
            MAX_TS = MIN_TS) nu
UNION ALL
  SELECT DISTINCT uuid, 'REGULAR' as user_type FROM
    (SELECT uuid, MAX(ts) as MAX_TS
       FROM TABLE2
       WHERE ID > 10 AND
             IP <> 2
       GROUP BY uuid) n
     INNER JOIN (SELECT *
                   FROM TABLE2
                   WHERE ID > 10 AND
                          IP <> 2) t
       ON (t.uuid = n.uuid)
     WHERE n.MAX_TS > SYSTIMESTAMP - INTERVAL '1' DAY AND
           t.ts < SYSTIMESTAMP - INTERVAL '1' DAY AND
           t.ts > SYSTIMESTAMP - INTERVAL '3' DAY;

I can't really see a use for TABLE1 here. Is it required that you use it?

Don't know if MySQL supports SYSTIMESTAMP or the INTERVAL construct. Hopefully, though, this will provide you with some ideas.

Bob Jarvis
in table1 there may days where there is no entry at all. So table1 is used as a seed for dates; and count of users is computed on all those dates that are in table2.
Amit