tags:

views:

66

answers:

4

Let's say we have three tables in a relational database:

Person {id, name}
Obstacle {id, name}
Person_overcomes_obstacle {person_id, obstacle_id}

I want to write a query that tells me if at least one person has overcome all obstacles. Ideas?

+3  A: 
SELECT 
    p.name,
    COUNT(DISTINCT oo.obstacle_id) AS OBSTACLES_COMPLETED
FROM 
    person p

    JOIN person_overcomes_obstacle oo
    ON oo.person_id = p.person_id

GROUP BY 
    p.name

HAVING 
    COUNT(DISTINCT poo.obstacle_id) = (SELECT COUNT(id) FROM obstacle)
Adam Bernier
+1 for the query. It's worth noting that this assumes a 1 to 1 relationship between person and each unique obstacle. That is to say, a person does not complete an obstacle twice.
Jason McCreary
Thanks very much for the comment. The `DISTINCT` in line 3 (and also in the last line) will ensure that each completed obstacle is counted exactly once.
Adam Bernier
This query is good, but it calculates a lot more than is asked for in the question. Alexandros just wanted to find out "if at least one person has overcome all obstacles" -- he didn't say he wanted to know who that person was.
Tom
Thanks, Tom. I seem to have overshot the goal :-/
Adam Bernier
+1: Only concern I'd raise is grouping on the name column - could fall victim to "John Smith"'s...
OMG Ponies
Thanks for the solution Adam. I agonized over whether to accept yours or martin's. Yours performs best, martins is more general. I got tremendous value out of the relational division link so I had to accept that one. If I could accept both I would, please accept an upvote and my apologies. cheers.
Alexandros Marinos
@Alexandros: cheers, and no need to apologize. Thanks for the information about the performance of the two.
Adam Bernier
+1  A: 

This SELECT should return the number of obstacles that nobody has overcome. If the count is zero then all the obstacles have been overcome by at least one person.

SELECT count(*)
  FROM (SELECT po.person_id, o.obstacle_id
          FROM Obstacle o
          LEFT OUTER JOIN Person_overcomes_obstacle po
            ON (o.obstacle_id = po.obstacle_id)) t
WHERE t.person_id IS NULL

You could also do this, for the same effect (and possibly better performance):

SELECT count(*)
  FROM Obstacle o
 WHERE NOT EXISTS (SELECT 1
                     FROM Person_overcomes_obstacle po
                    WHERE po.obstacle_id = o.obstacle_id)

EDIT: As pointed out in the comments, the above two queries only prove that there are no obstacles that nobody has overcome, and not that a single individual has overcome all obstacles.

That aside, it should still be possible to prove that a single user has overcome all obstacles without querying the Person table:

SELECT t.personid, count(*)
  FROM (SELECT DISTINCT po.person_id, o.obstacle_id
          FROM Obstacle o
          JOIN Person_overcomes_obstacle po
            ON (o.obstacle_id = po.obstacle_id)) t
 GROUP BY t.persion_id
 HAVING count(*) = (SELECT count(*)
                      FROM obstacle)
Tom
+1 for a far better reading of the question.
Adam Bernier
if we have 2, and 2 obstacles, and each person completes a different obstacle, this will still return 0, even though there is no one single person that has completed all obstacles.
mdma
Are you sure these work? I'm not sure you're checking that it is the same person that overcame the obstacles? i.e. if all obstacles were overcome by 10 different people.
Martin Smith
You guys are absolutely right! My queries are just proving that every obstacle has been overcome -- not that a single person has overcome them all.
Tom
OMG Ponies
+3  A: 

I notice that I was the only person to use the natural table alias for Person_overcomes_obstacle!

You need a relational division query for this.

You can either count up the obstacles and the matching records in Person_overcomes_obstacle and return ones where the 2 numbers match or look at it another way as finding people for which there is no obstacle that they haven't overcome.

SELECT p.id, p.name /*Or use COUNT(*) or wrap in Exists 
                     if you don't care about ids and names*/
FROM Person p
WHERE NOT EXISTS
   (SELECT * FROM Obstacle o
    WHERE NOT EXISTS 
    (
       SELECT * FROM Person_overcomes_obstacle poo
       WHERE poo.person_id = p.id and o.id = poo.obstacle_id
    )
)
Martin Smith
This SELECT will work, but you have to query the Person table. If we don't care who has overcome all the obstacles, and only that someone has, then we shouldn't need to access the Person table at all, and can get better performance by doing so.
Tom
@Tom - But I think there might be an issue with your approach at least as it stands as it doesn't check that it is the *same* person.
Martin Smith
That'll return an invalid reference error for `p.id` because you can access table aliases one level deep in subquery syntax.
OMG Ponies
@OMG - Just realised that I hadn't even given it an alias! What RDBMS does that alias rule apply to? Definitely not SQL Server.
Martin Smith
MySQL and SQL Server require that a table alias are defined for dervied tables/inline views and joining to the same table. My comment overlooked your typo, but the subquery issue remains a show stopper.
OMG Ponies
@OMG - No SQL Server is definitely not bothered http://odata.stackexchange.com/stackoverflow/q/7611/
Martin Smith
That's Azure (SQL Server 2008) - this isn't tagged as SQL Server related.
OMG Ponies
Ah I just read your comment again. It is a 2 level correlated sub query so none of the options you said. Good enough for Celko good enough for me! http://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/
Martin Smith
...although actually he attributes it to Chris Date in that article.
Martin Smith
Thanks a lot, this was exactly what I needed.
Alexandros Marinos
+1  A: 

Use:

SELECT poo.person_id
    FROM PERSON_OVERCOMES_OBSTACLE poo
GROUP BY poo.person_id
    HAVING COUNT(DISTINCT poo.obstacle_id) = (SELECT COUNT(*) 
                                                                        FROM OBSTACLE)

This is an alternative that is less likely to perform well:

SELECT x.person_id
   FROM (SELECT poo.person_id,
                         COUNT(DISTINCT poo.obstacle_id) AS obs_overcome,
                         (SELECT COUNT(*) 
                             FROM OBSTACLE) AS obs_total
                 FROM PERSON_OVERCOMES_OBSTALCE poo
          GROUP BY poo.person_id) x
 WHERE x.obs_overcome = x.obs_total

In either case, you can join to the PERSON table to get more information if you want - or you could run a count on person_id from either query to know how many people completed all the obstacles recorded.

OMG Ponies