tags:

views:

65

answers:

4

I'm running the following query, which tells me which agents have ever been scheduled on a project at location 51:

SELECT agents.agentid, agents.firstname, agents.lastname,
       schedule.projectid, projects.locationid
FROM agents
LEFT JOIN schedule USING (agentid)
LEFT JOIN projects USING (projectid)
WHERE (projects.locationid <=> 51)
GROUP BY agentid
ORDER BY agentid;

It yields 1249 rows (which is the correct result set).

If I then want to see the opposite, that is, agents that have NOT ever been scheduled on a project at location 51, I change the WHERE clause to:

WHERE NOT (projects.locationid <=> 51)

It yields 16169 rows. 1249+16169 = more rows than are in the agents table.

It's clearly because an agent can be involved in a project at more than one location, so when that happens, he shows up in both.

So my question is, how can I make this second query work? That is, how can I find out which agents have never been involved at a project at location 51?

Thank you for any help!

Edit: Here are the table structures and sample data:

CREATE TABLE IF NOT EXISTS `agents` (
`agentid` SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
`firstname` VARCHAR(45) NOT NULL,
`lastname` VARCHAR(45) NOT NULL,
PRIMARY KEY (`agentid`));

CREATE TABLE IF NOT EXISTS `schedule` (
`scheduleid` SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
`agentid` SMALLINT(5) UNSIGNED NOT NULL,
`projectid` SMALLINT(5) UNSIGNED NOT NULL,
PRIMARY KEY (`scheduleid`));

CREATE TABLE IF NOT EXISTS `projects` (
`projectid` SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
`locationid` SMALLINT(5) UNSIGNED NOT NULL,
PRIMARY KEY (`projectid`));

INSERT INTO `agents` (`agentid`,`firstname`, `lastname`)
VALUES (1, 'Bob', 'Smith'), (2, 'John','Doe'), (3, 'Jane','Doe'), (4, 'Sam','Foo'), (5, 'Emily','Bar');

INSERT INTO `projects` (`projectid`, `locationid`)
VALUES (1, 51), (2, 12), (3,15), (4,51), (5,99), (6,21), (7,51);

INSERT INTO `schedule` (`scheduleid`, `agentid`, `projectid`)
VALUES (1, 1, 1), (2, 2, 3), (3, 4, 3), (4, 1, 6), (5, 3, 5), (6, 5, 1), (7, 5, 3), (8, 5, 7), (9, 3, 6), (10, 4, 4);
A: 

Could you please send the table structure and sample data?

gajendra.bang
Jonathan Leffler and madgnome, the subquery may affect the performance? what do you think?
gajendra.bang
Do you have another approach that might have better performance, GAHB?
pbarney
hi, do one thing first, add indexes in projects and schedule table. This will improve performance a lot. Then try this query SELECT*FROM agentsLEFT JOIN ( SELECT agents.agentid, schedule.scheduleid,schedule.projectid, projects.locationid FROM agents LEFT JOIN SCHEDULE USING (agentid) LEFT JOIN projects USING (projectid) WHERE (projects.locationid = 51) GROUP BY agentid ORDER BY agentid ) AS t ON agents.agentid = t.agentidWHERE t.agentid IS NULL
gajendra.bang
While this approach does work, it's considerably slower (19 seconds) than Jonathon Leffler's solution (1 second) on my ~16,000 row dataset. Thank you for the insight!
pbarney
ya, sometimes subquery runs faster than joining. I have a dataset with ~200000 rows, subquery is slower there so I thought it should work for you also :)
gajendra.bang
A: 

You could use the INclause. Select all the agents that are not in the list of agents who have worked in location 51 :

SELECT
*
FROM agents
WHERE
agents.agentid NOT IN
 (
  SELECT agents.agentid
  FROM agents
  LEFT JOIN schedule USING (agentid)
  LEFT JOIN projects USING (projectid)
  WHERE (projects.locationid = 51)
  GROUP BY agentid
  ORDER BY agentid  
 )
madgnome
thank you for this. This almost got me there, but the group by clause caused the query to run for almost seven minutes. Removing it made it work perfectly.
pbarney
+1  A: 

Assuming that the LEFT OUTER joins are necessary (of which I'm not convinced), then you want the list of agents not in the list you obtained above. Hence:

SELECT agents.agentid, agents.firstname, agents.lastname
  FROM agents
 WHERE agentid NOT IN
       (SELECT agents.agentid
          FROM agents
          LEFT JOIN schedule USING (agentid)
          LEFT JOIN projects USING (projectid)
         WHERE (projects.locationid = 51)
       )
 ORDER BY agentid;

Clearly, if you want to know which projects these agents have been involved in, then you reinstate the joins into the FROM clause of the outer query.

Jonathan Leffler
You're right, they're not necessary. And thank you, this worked amazingly well.
pbarney
A: 

hi, do one thing first, add indexes in projects and schedule table. This will improve performance a lot. Then try this query

SELECT
*
FROM agents
LEFT JOIN 
 (
  SELECT agents.agentid, schedule.scheduleid,schedule.projectid, projects.locationid
  FROM agents
  LEFT JOIN SCHEDULE USING (agentid)
  LEFT JOIN projects USING (projectid)
  WHERE (projects.locationid = 51)
  GROUP BY agentid
  ORDER BY agentid  
 ) AS t
 ON agents.agentid = t.agentid
WHERE t.agentid IS NULL
gajendra.bang