views:

103

answers:

2

My table looks like this:

`MyDB`.`Details` (
  `id` bigint(20) NOT NULL,
  `run_id` int(11) NOT NULL,
  `element_name` varchar(255) NOT NULL,
  `value` text,
  `line_order` int(11) default NULL,
  `column_order` int(11) default NULL
);

I have the following SELECT statement in a stored procedure

SELECT
  RULE
  ,TITLE
  ,SUM(IF(t.PASSED='Y',1,0)) AS PASS
  ,SUM(IF(t.PASSED='N',1,0)) AS FAIL
FROM
(
SELECT
  a.line_order
  ,MAX(CASE WHEN a.element_name = 'PASSED' THEN a.`value` END) AS PASSED
  ,MAX(CASE WHEN a.element_name = 'RULE' THEN a.`value` END) AS RULE
  ,MAX(CASE WHEN a.element_name = 'TITLE' THEN a.`value` END) AS TITLE
FROM
  Details a
WHERE
  run_id = runId
GROUP BY line_order
) t
GROUP BY RULE, TITLE;

*runId is an input parameter to the stored procedure.

This query takes about 14 seconds to run. The table has 214856 rows, and the particular run_id I am filtering on has 162204 records. It's not on a super high power machine, but I feel like I could be doing this more efficiently. My main goal is to summarize by Rule and Title and show Pass and Fail count columns.

The table create script:

CREATE TABLE  `MyDB`.`Details` (
  `id` bigint(20) NOT NULL,
  `run_id` int(11) NOT NULL,
  `element_name` varchar(255) NOT NULL,
  `value` text,
  `line_order` int(11) default NULL,
  `column_order` int(11) default NULL,
  KEY `report_id` (`run_id`),
  KEY `element_name` (`element_name`),
  CONSTRAINT `Details_ibfk_1` FOREIGN KEY (`run_id`) REFERENCES `RunHistory` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The explain:

select `t`.`RULE` AS `RULE`,`t`.`TITLE` AS `TITLE`,sum(if((`t`.`PASSED` = _utf8'Y'),1,0)) AS `PASS`,sum(if((`t`.`PASSED` = _utf8'N'),1,0)) AS `FAIL` from (select `TAA`.`Details`.`line_order` AS `line_order`,max((case when (`TAA`.`Details`.`element_name` = _utf8'PASSED') then `TAA`.`Details`.`value` end)) AS `PASSED`,max((case when (`TAA`.`Details`.`element_name` = _utf8'RULE') then `TAA`.`Details`.`value` end)) AS `RULE`,max((case when (`TAA`.`Details`.`element_name` = _utf8'TITLE') then `TAA`.`Details`.`value` end)) AS `TITLE` from `TAA`.`Details` where (`TAA`.`Details`.`run_id` = 66) group by `TAA`.`Details`.`line_order`) `t` group by `t`.`RULE`,`t`.`TITLE`
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 3068, 'Using temporary; Using filesort'
2, 'DERIVED', 'Details', 'ref', 'report_id', 'report_id', '4', '', 107563, 'Using where; Using temporary; Using filesort'
+2  A: 

Well the biggest gain would be if you could normalize your data away from attribute-value approach and avoid transforming them. Can you do that?

Can you also do SHOW INDEXES IN Details?

EDIT: It seems the newtorder index suggestion worked for you. Could you just satisfy my curiosity and check two things:

Here is a rewrite of the query that turns the aggregate into joins, could you check how does it compare to the original one?

SELECT
  RULE
  ,TITLE
  ,SUM(IF(t.PASSED='Y',1,0)) AS PASS
  ,SUM(IF(t.PASSED='N',1,0)) AS FAIL
FROM
(
SELECT
  a.line_order,
  a.value AS TITLE,
  b.value AS RULE,
  c.value AS PASSED
FROM
  Details a INNER JOIN 
  Details b ON a.line_order = b.line_order AND a.run_id = runId AND a.element_name = 'PASSED' INNER JOIN 
  Details c ON b.line_order = c.line_order AND b.run_id = runId AND b.element_name = 'RULE' AND c.element_name = 'TITLE' 
) t
GROUP BY RULE, TITLE;

A few notes regarding the join

  • I assumed that for a TITLE there will be at least one RULE and PASSED value, if not turn the INNER JOINS to LEFT JOINS
  • Also I have moved the where conditions inside the join condition, this can sometimes be a hint to mysql to use indexes where it normally would not, but I am not sure about that and if you have inclination you can experiment with join conditions moving them from the ON into WHERE

Finally, could you also check what happens to the speed of your query if you add index on ('run_id', 'line_order','element_name', 'value')? I am not sure it will improve things (it will increase the size of the index, but will cut access to the individual rows), so it would be interesting to see the numbers (check the plan to see if it is really used).

Finally, another note on the original query - it might be possible to at least do aggregation in one step. Would you like to investigate that further?

Unreason
@Unreason: Unfortunately I can't. There are over 900 different 'elements' that could be reported on, none are required, and they can be in any order.
manyxcxi
@Unreason: Based on the data processed, there are sometimes no TITLE and no RULE, so it simply just gives me null,null,[n passes],[n fails] as output, which is a perfectly acceptable use case. I will try your query and report back to you how it works out. Thank you very much for your help as well!
manyxcxi
@manyxcxi, ok then when turning it into LEFT JOINS you should then take out some of the conditions out of joins (because it will return bizarre results as it is if turned to LEFT JOIN; if you need help let me know, still simply for the speed comparison I think you can run the INNER JOIN version to evaluate if getting it to work even makes sense)
Unreason
+3  A: 

Wow. Here is a couple of hints:

  1. The first problem here is using TEXT fields. When you use TEXT or BLOB, all temporary tables will be created on disk instead of memory. Try using varchar(N) instead, keep N as small as possible. It is strange that MySQL allows GROUPing BY on TEXT fields at all.
  2. I would consider creation of a composite index on ('run_id', 'line_order','element_name'). Just to try, such an index may significantly affect performance in other parts of your system.
newtover
@newtover: I'll try creating and index and reducing TEXT to varchar(N) if I can find a suitable N. The problem is that the value field can hold messages that are pretty long, a few characters, or a single int- it's sort of a catch-all. I know that using TEXT is wasteful, but it may be my only option IF I stick with this schema.
manyxcxi
@newtover: SELECT(MAX(LENGTH(`value`))) FROM DetailsGives me 482, so I think varchar is not going to be in the cards for me...
manyxcxi
@newtover: The index managed to cut my query time from 14 seconds to .3045 seconds! I played around with adding value into the key as well, but the same query ran in about 7. Thank you for point out my glaring mistake.
manyxcxi
@manyxcxi: I am glad I managed to help. Just wanted to say that even varchar (1000) is much better than TEXT in your case.
newtover
@newtover: I was under the impression (cannot remember where I read it) that if you were going to have a varchar over 255 in length it was better to use TEXT. I am admittedly not a MySQL DBA, I am usually just the guy developing against it. But if, in your opinion using 1000 is still better than TEXT, I'll certainly give that a whirl and try to figure out what a rational max length would be.
manyxcxi
@manyxcxi: varchar can be up to 2^16 bytes long. It stores the length as the first byte if it is less than 256 and two bytes otherwise. When the character set of the field is utf-8, the length in bytes is character length multiplied by 3 (thus, your `element_name` column is actually up to 768 bytes long). TEXT fields store a reference to a block on disk (which result in item 1 of my answer). The rule of thumb should be to use VARCHAR instead of TEXT when the field is used in conditions (or extract the minimum to a separate field: prefix, hash value, etc.).
newtover