views:

598

answers:

1

Simplified Table structure:

CREATE TABLE IF NOT EXISTS `hpa` (
  `id` bigint(15) NOT NULL auto_increment,
  `core` varchar(50) NOT NULL,
  `hostname` varchar(50) NOT NULL,
  `status` varchar(255) NOT NULL,
  `entered_date` int(11) NOT NULL,
  `active_date` int(11) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `hostname` (`hostname`),
  KEY `status` (`status`),
  KEY `entered_date` (`entered_date`),
  KEY `core` (`core`),
  KEY `active_date` (`active_date`)
)

For this, I have the following SQL query which simply totals up all records with the defined status.

SELECT core,COUNT(hostname) AS hostname_count, MAX(active_date) AS last_active
          FROM `hpa`
          WHERE 
          status != 'OK' AND status != 'Repaired'
          GROUP BY core
          ORDER BY core

This query has been simplified to remove the INNER JOINS to unrelated data and extra columns that shouldn't affect the question.

MAX(active_date) is the same for all records of a particular day, and should always select the most recent day, or allow an offset from NOW(). (it's a UNIXTIME field)

I want both the count of: (status != 'OK' AND status != 'Repaired')

AND the inverse... count of: (status = 'OK' OR status = 'Repaired')

AND the first answer divided by the second, for 'percentage_dead' (Probably just as fast to do in post processing)

FOR the most recent day or an offset ( - 86400 for yesterday, etc..)

Table contains about 500k records and grows by about 5000 a day so a single SQL query as opposed to looping would be real nice..

I imagine some creative IF's could do this. You expertise is appreciated.

EDIT: I'm open to using a different SQL query for either todays data, or data from an offset.

EDIT: Query works, is fast enough, but I currently can't let the users sort on the percentage column (the one derived from bad and good counts). This is not a show stopper, but I allow them to sort on everything else. The ORDER BY of this:

SELECT h1.core, MAX(h1.entered_date) AS last_active, 
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS good_host_count,  
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS bad_host_count 
FROM `hpa` h1 
LEFT OUTER JOIN `hpa` h2 ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date) 
WHERE h2.hostname IS NULL 
GROUP BY h1.core 
ORDER BY ( bad_host_count / ( bad_host_count + good_host_count ) ) DESC,h1.core

Gives me: #1247 - Reference 'bad_host_count' not supported (reference to group function)

EDIT: Solved for a different section. The following works and allows me to ORDER BY percentage_dead

SELECT c.core, c.last_active, 
SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) AS good_host_count,  
SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) AS bad_host_count,
( SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) * 100/
( (SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) )+(SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) ) ) ) AS percentage_dead
FROM `agent_cores` c 
LEFT JOIN `dead_agents` d ON c.core = d.core
WHERE d.active = 1
GROUP BY c.core
ORDER BY percentage_dead
+1  A: 

If I understand, you want to get a count of the status of OK vs. not OK hostnames, on the date of the last activity. Right? And then that should be grouped by core.

SELECT core, MAX(active_date)
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2 
  ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY core
ORDER BY core;

This is a variation of the "greatest-n-per-group" problem that I see a lot in SQL questions on StackOverflow.

First want to choose only the rows that have the latest activity date per hostname, which we can do by doing an outer join for rows with the same hostname and a greater active_date. Where we find no such match, we already have the latest rows for each given hostname.

Then group by core and count the rows by status.

That's the solution for today's date (assuming no row has an active_date in the future). To restrict the result to rows N days ago, you have to restrict both tables.

SELECT core, MAX(active_date)
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2 
  ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
  AND h2.active_date <= CURDATE() - INTERVAL 1 DAY)
WHERE h1.active_date <= CURDATE() - INTERVAL 1 DAY AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;

Regarding the ratio between OK and broken hostnames, I'd recommend just calculating that in your PHP code. SQL doesn't allow you to reference column aliases in other select-list expressions, so you'd have to wrap the above as a subquery and that's more complex than it's worth in this case.


I forgot you said you're using a UNIX timestamp. Do something like this:

SELECT core, MAX(active_date)
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS OK_host_count,
  SUM(CASE WHEN status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS broken_host_count
FROM `hpa` h1 LEFT OUTER JOIN `hpa` h2 
  ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date
  AND h2.active_date <= UNIX_TIMESTAMP() - 86400)
WHERE h1.active_date <= UNIX_TIMESTAMP() - 86400 AND h2.hostname IS NULL
GROUP BY core
ORDER BY core;
Bill Karwin
Thank you Bill! Can't test this immediately though as I'm done for the day. First part I get. I'll have to study the second for a while I think. :)
Daren Schwenke
It's actually an int storing epoch time, not DATETIME. Make a difference?
Daren Schwenke
Ok, it changes how you calculate the offset, but not the general logic. I'll add an example.
Bill Karwin
Thank you. Can't break the habit of storing dates as epoch, but I think I have good reason until they add microsecond accuracy to datetime. Easy to do with epoch.
Daren Schwenke
I'm not sure what you mean by that, because `UNIX_TIMESTAMP()` measures time in seconds, and it's an integer. So where do the microseconds come in? Anyway, this is orthogonal to your original question.
Bill Karwin
Yeah. This code doesn't need microseconds. But my concurrent user code that does all I have to do is go from int 11 to int 15.
Daren Schwenke
INT(11) vs. INT(15) has nothing to do with the range of values. They're both still 32-bit integers. See http://stackoverflow.com/questions/1632403/what-is-the-difference-when-being-applied-to-my-code-between-int10-and-int12/1632567
Bill Karwin
Hmm. Never knew that. Thanks.
Daren Schwenke
Yeah, it's a very common misunderstanding among MySQL users. It's a totally natural assumption though, since it looks so similar to CHAR(11) vs. CHAR(15). That was a poor design decision by MySQL.
Bill Karwin