tags:

views:

643

answers:

5

I'm looking for help using sum() in my SQL query:

SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.

The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)

I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.

Is that possible?

A: 

I use a subquery to do this. It eliminates the problems with grouping. So the query would be something like:

SELECT COUNT(DISTINCT conversions.id)
...
     (SELECT SUM(conversions.value) FROM ....) AS Vals
Dave
Updated question with my full query. I'm not sure how I'd integrate a subquery into what I have and how it would affect performance.
makeee
Subqueries normally impact performance negatively. To minimize the impact make sure any subquery is acting on an index.
Dave
+1  A: 

I may be wrong but from what I understand

  • conversions.id is the primary key of your table conversions
  • stats.id is the primary key of your table stats

Thus for each conversions.id you have at most one links.id impacted.

You request is a bit like doing the cartesian product of 2 sets :

[clicks]
SELECT *
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 

[conversions]
SELECT *
FROM links 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 

and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines

As you noted the number of unique conversions in your request can be obtained via a

count(distinct conversions.id) = sizeof([conversions])

this distinct manages to remove all the [clicks] lines in the cartesian product

but clearly

sum(conversions.value) = sum([conversions].value) * sizeof([clicks])

In your case, since

count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)

you have

sizeof([clicks]) = count(*)/count(distinct conversions.id)

so I would test your request with

SELECT links.id, 
   count(DISTINCT stats.id) as clicks, 
   count(DISTINCT conversions.id) as conversions, 
   sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

Keep me posted ! Jerome

Jerome WAGNER
A: 

Use the following query:

SELECT links.id
  , (
    SELECT COUNT(*)
    FROM stats
    WHERE links.id = stats.parent_id
  ) AS clicks
  , conversions.conversions
  , conversions.conversion_value
FROM links
LEFT JOIN (
  SELECT link_id
    , COUNT(id) AS conversions
    , SUM(conversions.value) AS conversion_value
  FROM conversions
  GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC
Bryson
A: 

How about something like this:

select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from    (SELECT l.id id, l.created created,
               s.id clicks,  
               c.id conversions,  
               max(c.value) conversion_value                    
        FROM links l LEFT
        JOIN stats s ON l.id = s.parent_id LEFT
        JOIN conversions c ON l.id = c.link_id  
        GROUP BY l.id, l.created, s.id, c.id) t
order by t.created  
Quesi
A: 

For an explanation of why you were seeing stupid numbers, read this.

I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.

TehShrike