Simplified Table structure:
CREATE TABLE IF NOT EXISTS `hpa` (
`id` bigint(15) NOT NULL auto_increment,
`core` varchar(50) NOT NULL,
`hostname` varchar(50) NOT NULL,
`status` varchar(255) NOT NULL,
`entered_date` int(11) NOT NULL,
`active_date` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `hostname` (`hostname`),
KEY `status` (`status`),
KEY `entered_date` (`entered_date`),
KEY `core` (`core`),
KEY `active_date` (`active_date`)
)
For this, I have the following SQL query which simply totals up all records with the defined status.
SELECT core,COUNT(hostname) AS hostname_count, MAX(active_date) AS last_active
FROM `hpa`
WHERE
status != 'OK' AND status != 'Repaired'
GROUP BY core
ORDER BY core
This query has been simplified to remove the INNER JOINS to unrelated data and extra columns that shouldn't affect the question.
MAX(active_date) is the same for all records of a particular day, and should always select the most recent day, or allow an offset from NOW(). (it's a UNIXTIME field)
I want both the count of: (status != 'OK' AND status != 'Repaired')
AND the inverse... count of: (status = 'OK' OR status = 'Repaired')
AND the first answer divided by the second, for 'percentage_dead' (Probably just as fast to do in post processing)
FOR the most recent day or an offset ( - 86400 for yesterday, etc..)
Table contains about 500k records and grows by about 5000 a day so a single SQL query as opposed to looping would be real nice..
I imagine some creative IF's could do this. You expertise is appreciated.
EDIT: I'm open to using a different SQL query for either todays data, or data from an offset.
EDIT: Query works, is fast enough, but I currently can't let the users sort on the percentage column (the one derived from bad and good counts). This is not a show stopper, but I allow them to sort on everything else. The ORDER BY of this:
SELECT h1.core, MAX(h1.entered_date) AS last_active,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 1 ELSE 0 END) AS good_host_count,
SUM(CASE WHEN h1.status IN ('OK', 'Repaired') THEN 0 ELSE 1 END) AS bad_host_count
FROM `hpa` h1
LEFT OUTER JOIN `hpa` h2 ON (h1.hostname = h2.hostname AND h1.active_date < h2.active_date)
WHERE h2.hostname IS NULL
GROUP BY h1.core
ORDER BY ( bad_host_count / ( bad_host_count + good_host_count ) ) DESC,h1.core
Gives me: #1247 - Reference 'bad_host_count' not supported (reference to group function)
EDIT: Solved for a different section. The following works and allows me to ORDER BY percentage_dead
SELECT c.core, c.last_active,
SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) AS good_host_count,
SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) AS bad_host_count,
( SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) * 100/
( (SUM(CASE WHEN d.dead = 1 THEN 0 ELSE 1 END) )+(SUM(CASE WHEN d.dead = 1 THEN 1 ELSE 0 END) ) ) ) AS percentage_dead
FROM `agent_cores` c
LEFT JOIN `dead_agents` d ON c.core = d.core
WHERE d.active = 1
GROUP BY c.core
ORDER BY percentage_dead