Dear Everyone, I have a table,
| PAGELETS | CREATE TABLE `PAGELETS` (
`page_key` int(32) unsigned NOT NULL,
`pagelet_serial` int(32) unsigned NOT NULL,
`pagelet_shingle` int(32) unsigned NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
I would like to:
1) Find all the pagelet_shingles where quantity > 1 ( occurs more than once)
2) out of these only output those that have different page_key
This is the query that produces the a semi-correct answer:
SELECT * FROM PAGELETS WHERE pagelet_shingle IN( SELECT pagelet_shingle FROM PAGELETS GROUP BY pagelet_shingle HAVING COUNT(DISTINCT page_key) > 1) ORDER BY pagelet_shingle;
Unfortunately, on a small dataset it takes about 18 seconds;
I have another query,
SELECT dt1.* FROM
(SELECT * FROM PAGELETS
GROUP BY page_key, pagelet_shingle HAVING COUNT(*) = 1)
dt1 JOIN
(SELECT * FROM PAGELETS GROUP BY pagelet_shingle HAVING COUNT(*) > 1)
dt2 USING (pagelet_shingle) ORDER BY pagelet_shingle
given by an expert which is not technically correct (something to do with you can't SELECT * .. GROUP ) but produces results that are A LOT faster, with the case where
SELECT * FROM PAGELETS WHERE pagelet_shingle=57
+----------+----------------+-----------------+
| page_key | pagelet_serial | pagelet_shingle |
+----------+----------------+-----------------+
| 1 | 99 | 57 |
| 1 | 99 | 57 |
| 2 | 228 | 57 |
| 2 | 228 | 57 |
+----------+----------------+-----------------+
The semi-correct query produces
+----------+----------------+-----------------+
| page_key | pagelet_serial | pagelet_shingle |
+----------+----------------+-----------------+
| 1 | 99 | 57 |
| 1 | 99 | 57 |
| 2 | 228 | 57 |
| 2 | 228 | 57 |
+----------+----------------+-----------------+
While the incorrect query doesn't have pagelet_shingle =57 in its resultset
My desired result is to have
+----------+----------------+-----------------+
| page_key | pagelet_serial | pagelet_shingle |
+----------+----------------+-----------------+
| 1 | 99 | 57 |
| 2 | 228 | 57 |
+----------+----------------+-----------------+
Each occuring once only.
a pagelet_shingle occuring twice in the same pagelet_serial will be omitted.
So I would like to ask the following: 1) Is there a way to to speed up the csemi orrect query to reach the speed of the incorrect one 2) or is there a way to fix the incorrect one to produce the result of the correct one ( I don't care about strictness )