ansaurus

Question

Mysql GROUP BY and COUNT for multiple WHERE clauses

Answer 1

+1 A:

If I understand, you want to get a count of the status of OK vs. not OK hostnames, on the date of the last activity. Right? And then that should be grouped by core.

SELECT core, MAX(active_date)
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2 
  ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY core
ORDER BY core;

This is a variation of the "greatest-n-per-group" problem that I see a lot in SQL questions on StackOverflow.

First want to choose only the rows that have the latest activity date per hostname, which we can do by doing an outer join for rows with the same hostname and a greater active_date. Where we find no such match, we already have the latest rows for each given hostname.

Then group by core and count the rows by status.

That's the solution for today's date (assuming no row has an active_date in the future). To restrict the result to rows N days ago, you have to restrict both tables.

SELECT core, MAX(active_date)
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2 
  ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
  AND h2.active_date <= CURDATE() - INTERVAL 1 DAY)
WHERE h1.active_date <= CURDATE() - INTERVAL 1 DAY AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;

Regarding the ratio between OK and broken hostnames, I'd recommend just calculating that in your PHP code. SQL doesn't allow you to reference column aliases in other select-list expressions, so you'd have to wrap the above as a subquery and that's more complex than it's worth in this case.

I forgot you said you're using a UNIX timestamp. Do something like this:

SELECT core, MAX(active_date)
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2 
  ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
  AND h2.active_date <= UNIX_TIMESTAMP() - 86400)
WHERE h1.active_date <= UNIX_TIMESTAMP() - 86400 AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;

Bill Karwin 2009-10-27 18:48:14

Thank you Bill! Can't test this immediately though as I'm done for the day. First part I get. I'll have to study the second for a while I think. :)

Daren Schwenke 2009-10-27 19:03:50

It's actually an int storing epoch time, not DATETIME. Make a difference?

Daren Schwenke 2009-10-27 19:06:43

Ok, it changes how you calculate the offset, but not the general logic. I'll add an example.

Bill Karwin 2009-10-27 19:07:49

Thank you. Can't break the habit of storing dates as epoch, but I think I have good reason until they add microsecond accuracy to datetime. Easy to do with epoch.

Daren Schwenke 2009-10-27 19:12:59

I'm not sure what you mean by that, because `UNIX_TIMESTAMP()` measures time in seconds, and it's an integer. So where do the microseconds come in? Anyway, this is orthogonal to your original question.

Bill Karwin 2009-10-27 22:59:11

Yeah. This code doesn't need microseconds. But my concurrent user code that does all I have to do is go from int 11 to int 15.

Daren Schwenke 2009-10-28 16:11:32

INT(11) vs. INT(15) has nothing to do with the range of values. They're both still 32-bit integers. See http://stackoverflow.com/questions/1632403/what-is-the-difference-when-being-applied-to-my-code-between-int10-and-int12/1632567

Bill Karwin 2009-10-28 17:52:50

Hmm. Never knew that. Thanks.

Daren Schwenke 2009-10-28 18:47:11

Yeah, it's a very common misunderstanding among MySQL users. It's a totally natural assumption though, since it looks so similar to CHAR(11) vs. CHAR(15). That was a poor design decision by MySQL.

Bill Karwin 2009-10-28 20:08:59

ansaurus

tags:

views:

answers:

Mysql GROUP BY and COUNT for multiple WHERE clauses

related questions