ansaurus

Question

Answer 1

A:

Sounds like SELECT DISTINCT p.* ... would be your choice.

P.S. And I would really recommend the second one! make everything slow (like you just noticed) and should only be used where necessary.

Marcel J. 2009-09-01 09:30:45

Actually why would you recommend the second one if it's slow? I don't want to use the second one because this will be applied to a dataset for over 20 times its current size.SELECT DISTINCT * FROM PAGELETS WHERE pagelet_shingle IN( SELECT pagelet_shingle FROM PAGELETS GROUP BY pagelet_shingle HAVING COUNT(DISTINCT page_key) > 1) ORDER BY pagelet_shingle;Solves it, but any way to speed this up using index? ( I don't know which I should index for this matter, I tried indexing key(page_shingle, page_key) but it was equally slow

2009-09-01 11:03:33

Whoops, was a little bit to early for me. Of course I meant the first one.

Marcel J. 2009-09-01 19:26:07

Answer 2

A:

doesn't this query solve your issue?

SELECT dt1.* FROM 
(SELECT DISTINCT * FROM PAGELETS 
GROUP BY page_key, pagelet_shingle HAVING COUNT(*) = 1) 
dt1 JOIN 
(SELECT * FROM PAGELETS GROUP BY pagelet_shingle HAVING COUNT(*) > 1) 
dt2 USING (pagelet_shingle) GROUP BY pagelet_shingle

pixeline 2009-09-01 09:36:12

Nope - not in MySQL (Ref: ENGINE=MyISAM)

goddva 2009-09-01 09:48:48

(1,64,8)(1,64,9)(1,64,10)(1,64,11)(1,64,12)(1,64,13)(1,64,14)(1,64,15)(1,64,16)(1,41,20)(1,41,21)(1,41,22)(1,99,48)(1,99,49)(1,99,50)(1,99,51)(1,99,52)(1,99,53)(1,99,54)(1,99,58)(1,99,59)(1,99,60)(1,99,61)Actually not really I can't see anything that has different page_key values; the one where page_key = 57 problem still exist( it's not found inside the result set becuase it occurs more than once)

2009-09-01 11:09:17

Answer 3

A:

What is

SELECT * FROM PAGELETS GROUP BY pagelet_serial, pagelet_shingle HAVING COUNT(*) > 0

giving you?

goddva 2009-09-01 09:38:30

| page_key | pagelet_serial | pagelet_shingle |+----------+----------------+-----------------+| 1 | 56 | 1 | | 1 | 56 | 2 | | 1 | 56 | 3 | | 2 | 186 | 8 | | 1 | 64 | 8 | | 1 | 64 | 9 | | 2 | 186 | 9 | | 1 | 64 | 10 | | 2 | 186 | 10 |

2009-09-01 10:55:02

Not what I really want:(1,56,1)(1,56,2)(1,56,3)(2,186,8)(1,64,8)(1,64,9)(2,186,9)(1,64,10)(2,186,10)(1,64,11)(2,186,11)(1,64,12)(2,186,12)(1,64,13)(2,186,13)(1,64,14)(2,186,14)(1,64,15)(2,186,15)(1,64,16)(2,186,16)(1,41,20)(2,203,20)(1,41,21)(2,203,21)(2,203,22)(1,41,22)(1,21,27)(1,21,28)(1,21,29)(1,21,30)(1,21,31)(1,21,32)(1,21,33)(1,21,34)(1,21,35)(1,21,36)(1,21,37)(1,21,38)(1,21,39)(1,21,40)(1,21,41)(1,21,42)(1,21,43)(1,21,44)(2,228,48)(1,99,48)(2,228,49)(1,99,49)(2,228,50)(1,99,50)(2,228,51)(1,99,51)(2,228,52)(1,99,52)

2009-09-01 10:58:58

Answer 4

A:

use GROUP BY and HAVING, e.g.

  SELECT *
    FROM `pagelets`
GROUP BY `pagelet_shingle`
  HAVING COUNT(*) > 1

additionally you can do a self join to output all columns, though in mysql it should work that way (different from SQL standard)

knittl 2009-09-01 09:47:42

Answer 5

A:

Judging from what I read, what you are looking for is:

SELECT DISTINCT p1.page_key, p1.pagelet_serial, p1.pagelet_shingle
  FROM PAGELETS p1
  JOIN PAGELETS p2 ON p2.page_key         = p1.page_key
                  AND p2.pagelet_serial   = p1.pagelet_serial
                  AND p2.pagelet_shingle <> p1.pagelet_shingle

That query would make full use of an index on (page_key, pagelet_serial) and should complete in tenth of seconds, not seconds.

If this was not what you were looking for, please show us what result you would expect if the values in your table were those: (1,2,3),(1,2,3),(1,1,3),(1,1,3),(1,2,4),(1,2,4),(1,1,4),(1,1,4)

Josh Davis 2009-09-01 11:54:29

Answer 6

A:

Have you tried using exists instead of in ?

Check this out: http://decipherinfosys.wordpress.com/2007/01/30/in-vs-exists/

Hope this helps

Pablo Cabrera 2009-09-01 14:44:37

ansaurus

tags:

views:

answers:

How to improve this query?

related questions