tags:

views:

63

answers:

3

I have a database with 2 tables:

Table 1:

CREATE TABLE IF NOT EXISTS `sales` (
  `sale_id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `sale_total` int(11) NOT NULL,
  `sale_date` date NOT NULL,
  `sale_status` int(11) NOT NULL,
  PRIMARY KEY (`sale_id`)
) ;

Table 2:

CREATE TABLE IF NOT EXISTS `users` (
  `user_id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  `lastname` varchar(200) NOT NULL,
  `mail` varchar(200) NOT NULL,
  PRIMARY KEY (`user_id`)
);

I need the following query optimized, so that it does not use subqueries. I this this could be accomplished using joins, but I don't know exactly how.

SELECT name, lastname, mail
FROM users
WHERE user_id IN (
   SELECT user_id
   FROM sales
   WHERE sale_date < '2009-01-01'
   AND sale_total >100
   AND sale_status =4
)
AND user_id NOT IN (
    SELECT user_id
    FROM sales
    WHERE sale_date >= '2009-01-01'
)
+1  A: 

How about a join:

SELECT u.name, u.lastname, u.mail FROM users u
INNER JOIN sales s ON s.user_id = u.user_id
WHERE s.sale_date < '2009-01-01'
AND s.sale_total >100
AND s.sale_status =4

The second query is unncessary - it already beend address with s.sale_date < '2009-01-01' (maybe you should explain your query in detail):

AND user_id NOT IN (
    SELECT user_id
    FROM sales
    WHERE sale_date >= '2009-01-01'
)

If it is necessary - it would be something like this (untested):

SELECT u.name, u.lastname, u.mail FROM users u
INNER JOIN sales s ON s.user_id = u.user_id
RIGHT OUTER JOIN sales e ON e.user_id = u.user_id
WHERE s.sale_date < '2009-01-01'
AND s.sale_total >100
AND s.sale_status =4
AND e.sale_date >= '2009-01-01'
AND e.user_id is null
Andreas Rehm
how is the second query unnecessary if he wants users who don't have a record in sale where sale_date >= '2009-01-01'?
MStodd
That's the point I'm thinking about right now... And this part might not be possible with a join.
Andreas Rehm
Thanks for the reply! Yes, I do want users who do not have any records in sales after 2009-01-01. Any ideas how to avoid both subqueries are still welcome.
infrared
Please try my last posted query - might be incomplete but it should show how to do it.
Andreas Rehm
+3  A: 

Use joins to replace IN subqueries, null-left-joins to replace NOT IN, and GROUP BY to return only one row for each user:

SELECT users.name, users.lastname, users.mail
FROM users
JOIN sales AS s0 ON s0.user_id=users.user_id
LEFT JOIN sales AS s1 ON s1.user_id=users.user_id AND sale_date>='2009-01-01'
WHERE s1.sale_id IS NULL
AND s0.sale_date < '2009-01-01' AND s0.sale_total>100 AND s0.sale_status=4
GROUP BY users.user_id
bobince
A: 

I think this is an expressive way to solve it. You use one join to include all appropriate sales with sales_status = 4, etc. Then you use a second join with a having clause to exclude any user that has sales after '2009-01-01'.

The performance will depend on the query plan that the database engine comes up with for your data, so you should test it to ensure the performance is better before replacing your original query.

select name, lastname, mail
from users 
inner join sales s1
    on users.user_id = s1.user_id
        and s1.sale_date < '2009-01-01'
        and s1.sale_total >100
        and s1.sale_status = 4
left join sales s2
    on s2.user_id = users.user_id
group by name, lastname, mail
having max(s2.sale_date) < '2009-01-01'
Brian Erickson