tags:

views:

49

answers:

5

General Case

How do you perform a left join across a many-to-many relationship when you want to add a condition to the foreign side of the relation?

Specific Case

We're dealing with two tables: team and pool. There is also a team_pool table serving as a many-to-many junction table between them. Additionally, a pool has a stage_id column.

I want to retrieve all teams with that team's pool for a specific stage. If the pool doesn't exist, I want the pool to be NULL. Sample, idealized results:

+--------+----------+------+
| team   | stage_id | pool |
+--------+----------+------+
| Team 1 |        2 | C    |
| Team 2 |     NULL | NULL | //this team doesn't belong to a pool for this stage (but it could belong to a pool for a different stage)
| Team 3 |        2 | G    |
| Team 3 |        2 | H    | //if a team belongs to a 2 pools for the same stage
| Team 4 |        2 | D    |

If it's relevant, I'm using MySQL.

SQL

Here's the (simplified) SQL used to create the tables:

CREATE TABLE team (id BIGINT AUTO_INCREMENT, name VARCHAR(50), PRIMARY KEY(id));
CREATE TABLE team_pool (team_id BIGINT, pool_id BIGINT, PRIMARY KEY(team_id, pool_id));
CREATE TABLE pool (id BIGINT AUTO_INCREMENT, stage_id BIGINT, name VARCHAR(40) NOT NULL, PRIMARY KEY(id));

Ideal Solution

The ideal solution would:

  • Not require a change to my schema (ORM really wants it this way)
  • Require a single, non UNION query.

Attempted solutions

  • Use an INNER JOIN rather than a LEFT JOIN from team to team_pool and team_pool to pool. Issue: we lose teams that don't belong to a pool
  • LEFT JOIN from team to team_pool and team_pool to pool using a WITH condition that the stage_id on pool matches what we're looking for. Issue: when a team belongs to many pools, we get multiple results. Adding a GROUP BY doesn't help.

EDIT: Chosen Solution

There are a lot of good solutions here.

Given that my ideal solution is not possible, I'd rather add stage_id to team_pool than use UNION or subqueries. This has the added benefit of letting me enforce that a team can only belong to one pool per stage. It also makes the queries simple:

SELECT t.name, p.name, tp.stage_id FROM team t LEFT JOIN team_pool tp ON t.id = tp.team_id AND tp.stage_id = 2 LEFT JOIN pool p ON p.id = tp.pool_id
+1  A: 

Short answer, it's not possible to fulfill all your criteria.

Longer answer: it's very possible to get all your results with a UNION. The first half will show teams that do have a pool for that stage; the second will show teams that do not.

select ...
from team
left join team_pool 
   on team.id = team_pool.team_id
left-join pool 
   on pool.id = team_pool.pool_id 
      and pool.stage_id = @stage
UNION
select distinct team.id, @stage, NULL
from team
where not exists 
   (select pool_id 
      from pool
      inner join team_pool
         on team_pool.pool_id = pool.pool_id 
      where team_pool.team_id = team.id 
         and pool.stage_id = @stage
KeithS
+1  A: 

If I understand the concepts behind your schema, then I would think stage_id should be a column in team_pool rather than pool. The stage is not an attribute of the pool, it is a factor in the mapping of teams to pools, right?

Regardless, this is how I would write your query in Oracle. I'm not sure if this exact syntax is right for MySQL. Presumably you would want to parameterize the literal value for stage_id.

SELECT t.name, p.name
  FROM (SELECT team.name, pool_id
          FROM team LEFT JOIN team_pool
            ON team_pool.team_id = team.team_id ) t
       LEFT JOIN (SELECT pool_id, name FROM pool WHERE stage_id = 2) p
            ON p.pool_id = t.pool_id
Dave Costa
The stage is an attribute of the pool (pools exist for a certain stage). If the choice is between a UNION and adding stage_id to team_pool, I'll choose the latter. The primary reason I was avoiding adding stage_id to team_pool is because the ORM I'm using makes it a little annoying to do so.
jeremy
+1  A: 

If you're looking for just one stage, you can left join the tables together. The pool name will be null if no pool is found. For example, for stage = 2:

select  t.name, 2, pool.name
from    team t
left join
        (
        select  team_id, p.name
        from    team_pool tp
        join    pool p
        on      tp.pool_id = p.id
        where   p.stage_id = 2
        ) pool
on      pool.team_id = t.id

If you'd like to query all stages, you could use a cross join to generate one row for each team-stage combination. The left join then searches for a pool for each row:

select  t.name, s.stage_id, p.name
from    team t
cross join
        (
        select  distinct stage_id
        from    pool
        ) s
left join
        (
        select  tp.team_id, p.stage_id, p.name
        from    team_pool tp
        join    pool p
        on      tp.pool_id = p.id
        ) pool
on      pool.team_id = t.id
        and pool.stage_id = s.stage_id

Because stage_id comes from the cross join, it will not be null if no pool is found.

Andomar
A: 
Snekse
Summary of SQL: Join pool and team_pool together to act like a single table, then left join team to that new pseudo table using team_id. Joining pool to team_pool could probably use an INNER JOIN; it's simply a way to get the team_id linked to the pool_id. Once that is established, then you should be able to link the team.team_id to any p.team_id without duplicates (I think).
Snekse
A: 

Just so we're clear:

You want to provide a team and a stage

You want to return: All teams sharing any pool with your provided team in the provided stage All teams with no pool in the provided stage all pool/team combo for any team sharing any pool with your provided team in the provided stage?

I am pretty sure this can be accomplished without UNION but I'm not entirely clear on what results you're after

As a starter, IMO:

Your initial select should be on teams table

You should add criteria for pool in the JOIN clause rather than the WHERE clause

You should join again on the teams table once you have a desired stage to introduce all of that teams pools

Matthew PK