views:

46

answers:

1

I have written a fairly complex SQL query to get some statistics about animals from an animal sampling database. This query includes a number of subqueries and I would now like to see if it is possible to rewrite this query in any way to use joins instead of subqueries. I have a dim idea that this might reduce query time. (it's now about 23s on a mac mini).

Here's the query:

SELECT COUNT(DISTINCT a.AnimalID), TO_DAYS(a.VisitDate) AS day, 
   DATE_FORMAT(a.VisitDate, '%b %d %Y'), a.origin, 
   (
    SELECT COUNT(DISTINCT a.AnimalID)
           FROM samples AS a 
                JOIN
                custom_animals AS b 
                ON a.AnimalID = b.animal_id
                WHERE
                     b.organism = 2
                     AND
                         TO_DAYS(a.VisitDate) = day
    ) AS Goats, 
    (
     SELECT COUNT(DISTINCT a.AnimalID) 
            FROM samples AS a 
                 JOIN custom_animals AS b 
                 ON a.AnimalID = b.animal_id 
                 WHERE
                      b.organism = 2
                      AND 
                         b.sex = 'Female'
                      AND
                         TO_DAYS(a.VisitDate) = day
    ) AS GF,
    (
     SELECT COUNT(DISTINCT a.AnimalID) 
            FROM samples AS a 
                 JOIN custom_animals AS b 
                 ON a.AnimalID = b.animal_id 
                 WHERE 
                      b.organism = 3
                      AND
                         b.sex = 'Female'
                      AND
                         TO_DAYS(a.VisitDate) = day
    ) AS SF
    FROM
        samples AS a 
        JOIN custom_animals AS b 
        ON a.AnimalID = b.animal_id 
        WHERE
             project = 5
             AND
                AnimalID LIKE 'AVD%'
        GROUP BY
                TO_DAYS(a.VisitDate);

Thanks to ksogor my query is now way faster at;

SELECT  DATE_FORMAT(s.VisitDate, '%b %d %Y') AS date,
    s.origin,
    SUM(IF(project = 5 AND s.AnimalID LIKE 'AVD%', 1, 0)) AS sampled_animals, 
    SUM(IF(ca.organism = 2, 1, 0)) AS sampled_goats,
    SUM(IF(ca.organism = 2 AND ca.sex = 'Female', 1, 0)) AS female_goats,
    SUM(IF(ca.organism = 3 AND ca.sex = 'Female', 1, 0)) AS female_sheep
FROM samples s JOIN custom_animals ca ON s.AnimalID = ca.animal_id
GROUP BY date;

I would still need to make this query select distinct s.AnimalID though as right now it counts the samples we have taken from these animals instead of the animals themselves. Anyone got any idea?


After some more help from ksogor I now have a great query:

SELECT  DATE_FORMAT(s.VisitDate, '%b %d %Y') AS date,
    s.origin,
    SUM(IF(project = 5 AND s.AnimalID LIKE 'AVD%', 1, 0)) AS sampled_animals, 
    SUM(IF(ca.organism = 2, 1, 0)) AS sampled_goats,
    SUM(IF(ca.organism = 2 AND ca.sex = 'Female', 1, 0)) AS female_goats,
    SUM(IF(ca.organism = 3 AND ca.sex = 'Female', 1, 0)) AS female_sheep
FROM (
    SELECT DISTINCT AnimalID AS AnimalID,
           VisitDate,
           origin,
           project
           FROM samples
) s 
JOIN custom_animals ca ON s.AnimalID = ca.animal_id
GROUP BY date;
+2  A: 

You can just use if or case statements, like this:

SELECT  SUM(if(project = 5 AND AnimealID LIKE 'AVD%', 1, 0)) AS countbyproj, 
        TO_DAYS(s.VisitDate) AS day, 
        DATE_FORMAT(s.VisitDate, '%b %d %Y') AS date,
        s.origin,
        SUM(if(ca.organism = 2, 1, 0)) AS countGoats,
        SUM(if(ca.organism = 2 AND ca.sex = 'Female', 1, 0)) AS countGF,
        SUM(if(ca.organism = 3 AND ca.sex = 'Female', 1, 0)) AS countSF
FROM samples s JOIN custom_animals ca ON s.AnimalID = ca.animal_id
GROUP BY TO_DAYS(a.VisitDate);

I can't check query, I don't know what result you're expected and which tables/relations you have, so this is only example with idea.

If you need count unque AnimealID's for each day:

SELECT SUM(byproj) AS countbyproj,
        day,
        date,
        origin,
        SUM(Goats) AS countGoats,
        SUM(GF) AS countGF,
        SUM(SF) AS countSF
FROM (
    SELECT  s.AnimealID,
            if(project = 5 AND AnimealID LIKE 'AVD%', 1, 0) AS byproj, 
            TO_DAYS(s.VisitDate) AS day, 
            DATE_FORMAT(s.VisitDate, '%b %d %Y') AS date,
            s.origin,
            if(ca.organism = 2, 1, 0)) AS Goats,
            if(ca.organism = 2 AND ca.sex = 'Female', 1, 0) AS GF,
            if(ca.organism = 3 AND ca.sex = 'Female', 1, 0) AS SF
    FROM samples s JOIN custom_animals ca ON s.AnimalID = ca.animal_id
    ) dataset
GROUP BY dataset.day, dataset.AnimealID;
ksogor
Thanks!That works great, executes in 1s! I'll need to figure out how to get it to select "DISTINCT s.AnimalID" though. I'm not very experienced with the if statement. Any ideas?
Norling Jr.
Umm... Do you grouping first by AnimealID or by day? You can try to use select from example without sum as dataset and select from it with counting/summing.
ksogor
Good idea to use a single subquery to prepare the dataset! I modified your code a bit (since I can do test runs ;)) but now it works better than I had hoped! Executes in 0.44s! (Added final query to question)
Norling Jr.