tags:

views:

310

answers:

2

I have the following query:

SELECT [ApplicationTitle] AS "ApplicationTitle", 
SUM([Visits]) AS "Visits",
SUM([NewVisits]) AS "NewVisits"
FROM [HeadlineFigures] 
WHERE [DataDate] >= '01/06/2009'
AND [DataDate] < '08/06/2009'
GROUP BY [ApplicationTitle]
ORDER BY [ApplicationTitle]

This returns data for 7 days. I want the same again, in the same query for a different week so it should look more like:

SELECT a.[ApplicationTitle] AS "ApplicationTitle", 
SUM(a.[Visits]) AS "Visits",
SUM(b.[Visits]) AS "PVisits",
SUM(a.[NewVisits]) AS "NewVisits"
SUM(b.[NewVisits]) AS "PNewVisits"
FROM [HeadlineFigures] a JOIN [HeadlineFigures] b
ON a.[ApplicationTitle] = b.[ApplicationTitle]
WHERE a.[DataDate] >= '01/06/2009'
AND a.[DataDate] < '08/06/2009'
AND b.[DataDate] >= '01/05/2009'
AND b.[DataDate] < '08/05/2009'
GROUP BY a.[ApplicationTitle]
ORDER BY a.[ApplicationTitle]

The above query is not correct though. What am I doing wrong?

+2  A: 

Your joining then aggregating, resulting in wrong aggregates. Ie. if for a ApplicationTitle you have 2 records in first set and 2 in the second, the join will have 4 records so the SUM doubles. You need to aggregate first, then join:

SELECT a.[ApplicationTitle],
    a.Visits,
    a.NewVisists,
    b.Visits as PVisits,
    b.NewVisits as PNewVisits
    FROM (
    SELECT [ApplicationTitle] AS "ApplicationTitle", 
     SUM([Visits]) AS "Visits",
     SUM([NewVisits]) AS "NewVisits"
     FROM [HeadlineFigures] 
     WHERE [DataDate] >= '01/06/2009'
     AND [DataDate] < '08/06/2009'
     GROUP BY [ApplicationTitle]
    ) AS a JOIN (
    SELECT [ApplicationTitle] AS "ApplicationTitle", 
     SUM([Visits]) AS "Visits",
     SUM([NewVisits]) AS "NewVisits"
     FROM [HeadlineFigures] 
     WHERE [DataDate] >= '01/05/2009'
     AND [DataDate] < '08/05/2009'
     GROUP BY [ApplicationTitle]
    ) AS b ON a.[ApplicationTitle] = b.[ApplicationTitle]
     ORDER BY [ApplicationTitle]
Remus Rusanu
+1 for the solution, but I think it would be better to implement such aggregate queries as User Defined Functions, so it would be easier to call them without replicating the whole SQL Statement.
Diego
This makes perfect sense to me :). Create an inline table with my first results, an inline table of the second set of results. Alias the tables, select the rows from the tables knowing there is one row per app name at most and join on the title. Sweet and it makes sense.
A: 

If I'm not missing something entirely, you are making things too complicated. Try this instead:

SELECT [ApplicationTitle] AS "ApplicationTitle",
       SUM(CASE WHEN [DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009' THEN [Visits] ELSE 0 END) AS "Visits",
       SUM(CASE WHEN [DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009' THEN  [Visits] ELSE 0 END) AS "PVisits",
       SUM(CASE WHEN [DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009' THEN [NewVisits] ELSE 0 END) AS "NewVisits",
       SUM(CASE WHEN [DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009' THEN [NewVisits] ELSE 0 END) AS "PNewVisits",
  FROM [HeadlineFigures]
 WHERE    ([DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009')
       OR ([DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009')
 GROUP BY [ApplicationTitle]
 ORDER BY [ApplicationTitle]

Edit: Oops, I misread the data range. I hope SQL server will do a good job using an index over the DataDate, otherwise the from clause would need to look something like:

FROM (SELECT * FROM [ApplicationTitle] WHERE [DataDate] >= '01/06/2009' AND [DataDate] < '08/06/2009'
      UNION
      SELECT * FROM [ApplicationTitle] WHERE [DataDate] >= '01/05/2009' AND [DataDate] < '08/05/2009') x
erikkallen
You should specialize the WHERE clause to match the original range(s), maybe there are 10 million more rows between 09/05 and 31/05 that are now scanned for no reason
Remus Rusanu
My bad. Answer updated.
erikkallen