views:

83

answers:

4

I know the title of this question is a bit confusing, so bear with me. :)

I have a (MySQL) database with a Person record. A Person also has a slug field. Unfortunately, slug fields are not unique. There are a number of duplicate records, i.e., the records have different IDs but the same first name, last name, and slug. A Person may also have 0 or more associated articles, blog entries, and podcast episodes.

If that's confusing, here's a diagram of the structure:

alt text

I would like to produce a list of records that match this criteria: duplicate records (i.e., same slug field) for people who also have at least 1 article, blog entry, or podcast episode.

I have a SQL query that will list all records with the same slug fields:

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records
FROM
 people_person
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;

But this includes records for people that may not have at least 1 article, blog entry, or podcast. Can I tweak this to fit the second criteria?

Edit:

I updated the database diagram to simplify it and make it more clear what I am doing. (Note, some of the DB table names changed -- I was trying to give a higher-level look at the structure before, but it was a bit unclear.)

+1  A: 

EDIT Based on your revisions to the schema, I have updated my solution:

Select person.last_name, person.first_name, person.slug, Count(*) As person_records
From people_person as person
Where Exists    (
                Select 1
                From magazine_author As ma1
                Where ma1.person_id = person.id
                )
    Or Exists   (
                Select 1
                From podcast_episode_guests As pod1
                Where pod1.person_id = person.Id
                )
    Or Exists   (
                Select 1
                From blogs_blog_authors As b1
                Where b1.person_id = person.Id
                )
Group By person.slug
Having Count(*) > 1
Order By person.last_name, person.first_name, person.slug, Count(*)
Thomas
Does the *EXISTS* clause exists in MySQL? =)
Will Marcouiller
@Will Marcouiller - Yep. In fact, I can't think of a DBMS that doesn't implement EXISTS.
Thomas
@Will Marcouiller - http://dev.mysql.com/doc/refman/5.5/en/exists-and-not-exists-subqueries.html
Thomas
@Thomas: Excellent! I didn't know that about MySQL as I have rarely used it. =)
Will Marcouiller
Yeah, I mislabeled a part of the diagram, hence the weird relationship/names. :) I updated the diagram a bit, including simplifying it and making it closer match my actual table names (I wanted to give a high-level look at the structure before, but I don't think that really worked out that well).
mipadi
@mipadi - I've revised my solution based on your updates.
Thomas
+1  A: 

You can still include a WHERE clause to filter the results:

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records
FROM
 people_person
WHERE id IN (SELECT id FROM article)
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;
Marcus Adams
+1  A: 

You could perhaps handle it through the having clause:

select Id
        , last_name
        , first_name
        , slug
        , COUNT(*) as Person_Records
    from Person as p
    group by Id
            , last_name
            , first_name
            , slug
        having COUNT(slug) > 1
            and ( 
                select COUNT(*)
                    from Author as a
                    where a.Person_Id = p.Id
            ) > 1
            and (
                select COUNT(*)
                    from Podcast_Guests as pg
                    where pg.Person_Id = p.Id
            ) > 1

I omitted the remaining conditions as this is a simple sample.

I hope this helps! =)

Will Marcouiller
+1  A: 
SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records,
FROM
 people_person
WHERE 
 id IN (SELECT person_id from podcast_guests GROUP BY person_id) OR 
 id IN (SELECT person_id from authors GROUP BY person_id) OR 
 [....]
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;
solomongaby