views:

60

answers:

7
+2  Q: 

SQL Optimization

How to optimize this?

SELECT e.attr_id, e.sku, a.value

FROM product_attr AS e, product_attr_text AS a

WHERE e.attr_id = a.attr_id
AND value
IN (
    SELECT value
    FROM product_attr_text 
    WHERE attribute_id = (
        SELECT attribute_id
        FROM eav_attr 
        WHERE attribute_code = 'similar_prod_id' 
    ) 
    AND value != ''

    GROUP BY value
    HAVING (COUNT( value ) > 1 )
)
+1  A: 

Difficult to answer ...

The only thing I can say is:

  • look at the execution plan

  • use IO statistics-

and take a look what part of the statement is problematic, and why. Find out if you can improve performance by adding an index, etc...

Next to that, I don't understand why you're using a group by clause ...

And, why do you use subqueries, instead of joining tables ? (Although I don't think that it will make much difference, since the SQL Server optimizer should be smart enough to generate the same execution plan).

Frederik Gheysels
good idea, forgot about the statistics!
powtac
He is using MySQL not SQL server - MySQL has a known issue with subqueries in the WHERE clause
Eran Galperin
+1  A: 

Change the second inner select into a join:

SELECT e.attr_id, e.sku, a.value
FROM product_attr AS e, product_attr_text AS a
WHERE e.attr_id = a.attr_id
AND value
IN (
    SELECT at.value
    FROM product_attr_text at, eav_attr eat 
    WHERE at.attribute_id = eat.attribute_id 
          AND eat.attribute_code = 'similar_prod_id' 
          AND value != ''
    GROUP BY value
    HAVING (COUNT( value ) > 1 )
)

Then, take a look to the execution plan to see how the query is solved. You might need additional optimizations based on that execution plan (e.g. make sure the attribute_id field is indexed).

Cătălin Pitiș
+1  A: 

Change it into a JOIN. MySQL doesn't optimize well subqueries in an IN() clause - it is recalculated per row (very inefficient for many rows)

SELECT e.attr_id, e.sku, a.value
FROM product_attr AS e
INNER JOIN product_attr_text AS a ON e.attr_id = a.attr_id
INNER JOIN (SELECT value
    FROM product_attr_text 
    INNER JOIN eav_attr ON eav_attr.attribute_id=product_attr_text.attribute_id
    WHERE attribute_code = 'similar_prod_id'  
    AND value != ''
    GROUP BY value
    HAVING COUNT( value ) > 1 
) AS filter ON filter.value=a.value

After you've converted the query (you may need to make some corrections according to your schema), run EXPLAIN on the query and index accordingly.

Eran Galperin
strange (and stupid) behavior from mysql, what about "exists" ?
remi bourgarel
MySQL in general has a problem with correlated subqueries in the WHERE clause. It is a known issue, hopefully will be resolved in the future. For now, it's best to use use JOINs instead
Eran Galperin
"strange (and stupid) behaviour"? Do you have any concept of the magnitude of complexity in resolving the near infinite SQL expressions into execution plans? Do the world a favour and study a subject before you mock someone's work.
Dems
A: 

use a derived table for the group by having... and join that back on whatever table(s) you need to pad out the result

f00
+1  A: 

frederik answer is the right one, but i'd like to suggest a few point :

  • avoid IN, use EXISTS instead
  • avoid = (SELECT something from sometable), use exists instead
  • useless group by
  • use inner join instead of multiple table in from clause

but again, it depends on your machine/dbms/version etc... so to get the best performance you'll have to compare the different execution plan

remi bourgarel
+2  A: 
SELECT  e.attr_id, e.sku, a.value
FROM    (
        SELECT  pat.value
        FROM    eav_attr ea
        JOIN    product_attr_text pat
        ON      pat.attribute_id = ea.attribute_id
        WHERE   ea.attribute_code = 'similar_prod_id'
                AND value <> ''
        GROUP BY
                value
        HAVING  COUNT(*) > 1
        ) q
JOIN    product_attr_text AS a
ON      a.value = q.value
JOIN    product_attr AS e
ON      e.attr_id  = a.attr_id

Create indexes:

 eav_attr (attribute_code)
 product_attr_text (attribute_id, value)
 product_attr_text (value)
 product_attr (attr_id)
Quassnoi
+1  A: 

Caution: Be wary of designs that use EAV (entity, attribute, value) table designs. They tend to become very difficult to handle, for reasons of referential integrity and query complexity.

Optimize by using:

  • Joins instead of sub-selects.
  • Join notation.
  • Explicit table aliases on all column names.

Phase 1:

SELECT e.attr_id, e.sku, a.value
  FROM product_attr AS e JOIN product_attr_text AS a
       ON e.attr_id = a.attr_id
 WHERE a.value IN (
          SELECT p.value
            FROM product_attr_text AS p
            JOIN eav_attr AS v ON p.attribute_id = v.attribute_id
           WHERE v.attribute_code = 'similar_prod_id'
             AND p.value != ''
           GROUP BY value
          HAVING (COUNT( value ) > 1)
          )

Phase 2:

SELECT e.attr_id, e.sku, a.value
  FROM product_attr AS e
  JOIN product_attr_text AS a ON e.attr_id = a.attr_id
  JOIN (SELECT p.value
          FROM product_attr_text AS p
          JOIN eav_attr AS v ON p.attribute_id = v.attribute_id
         WHERE v.attribute_code = 'similar_prod_id'
           AND p.value != ''
         GROUP BY value
        HAVING (COUNT( value ) > 1)
       ) AS x ON x.value = a.value
Jonathan Leffler