tags:

views:

331

answers:

4

Sorry about the vague subject but I couldn't think what to put.

Here's my problem, I'm doing a query on a table that returns me a count of items related to a day. I want to make sure that if I do a query on the DB, I always get a set number of rows. For example, imagine I have the following table that contains a log of when people log into a website:

**WebsiteLogin**
id: Integer
login_date: Datetime

I can then get counts of the logins for each date by doing something like:

SELECT DATE(login_date), COUNT(*) FROM WebsiteLogin GROUP BY DATE(login_date)

Which works great and will return me the data I want. But imagine my website was quite unpopular on the weekends. The data returned would look like:

2008-12-10, 100
2008-12-11, 124
2008-12-12, 151
2008-12-15, 141
2008-12-16, 111

The 13th & 14th are missing because there was no data for those dates. Is there any way I can change my query so that I get data that includes all the dates I query on. E.g.

2008-12-10, 100
2008-12-11, 124
2008-12-12, 151
2008-12-13, 0
2008-12-14, 0
2008-12-15, 141
2008-12-16, 111

I imagine I could do this if I set up a table containing all the dates in a year and then using a left/right join but that's really messy way of doing it.

So any clues on a nice way to do this in SQL? Or is programmatically my only choice? Cheers for any input.

+1  A: 

I imagine I could do this if I set up a table containing all the dates in a year and then using a left/right join but that's really messy way of doing it.

Nope. That's pretty much how to do it. On the other hand, you can use a temporary table and populate it with just the date range required.

If only MS SQL had virtual tables, where you provided a generator function...

Roger Lipscombe
+2  A: 

To do this you would need to write a stored procedure that returns a table result.

It would use a loop that would step thru each day and get the count and store it in a row of a temp table, then return that table as the resultset.

Here is a MS SQL server example of a loop:

http://www.databasejournal.com/features/mssql/article.php/3100621/T-SQL-Programming-Part-2---Building-a-T-SQL-Loop.htm

Quango
+1  A: 

You shouldn't need to create a temporary table, or similar, you just need a source with enough rows to construct the missing dates:

I don't know mysql, but if it supports "connect by" then you could do the following:

(this is in oracle)

select d login_date, count(login_date) count
from
    websitelogin wsl
    right outer join (
        select start_date+l-1 d from (select start_date, level l
        from (select min(login_date) start_date, max(login_date)-min(login_date)+1 num_days
        from websitelogin) connect by level <= num_days)) v on d=login_date
group by d
/

if mysql doesn't have connect by you could just join on some arbitrary table with enough rows in it instead and limit the result to the number of required rows:

select d login_date, count(login_date) count
from
    websitelogin wsl
    right outer join (select start_date+rownum-1 d from
(
select 
    min(login_date) start_date, 
    max(login_date)-min(login_date)+1 num_days
from websitelogin)v,all_objects
where rownum<=num_days
) v on d=login_date
group by d

not quite as neat though, and obviously you need to know that the driving table has enough rows in it.

Nifty way of doing it. I like it.
Jonathan
A: 

I know it isn't mysql, but I use the following function in MSSQL (see below for MySql version):

CREATE FUNCTION dbo.DatesBetween (@start_date datetime, @end_date datetime)
RETURNS @DateTable TABLE (gen_date datetime)
AS 
BEGIN
    DECLARE @num_dates int
    DECLARE @tmpVal TABLE (a_count int identity(0,1))

    SELECT @num_dates = datediff(day, @start_date, @end_date)
    WHILE (select isnull(max(a_count), 0) from @tmpVal) < @num_dates
     INSERT @tmpVal DEFAULT VALUES

    INSERT @DateTable (gen_date) 
    SELECT dateadd(day, a_count, @start_date) FROM @tmpVal

    RETURN
END

So, to use it in your example, I would try something like:

DECLARE @min_date datetime, @max_date datetime
SELECT @min_date = min(login_date), @max_date = max(login_date) 
FROM WebsiteLogin

SELECT m.gen_date 'login_date', isnull(l.num_visits, 0) 'num_visits'
FROM dbo.DatesBetween(@min_date, @max_date) as d
LEFT OUTER JOIN (SELECT DATE(login_date) 'login_date', COUNT(*) 'num_visits'
          FROM WebsiteLogin 
          GROUP BY DATE(login_date)) AS l ON d.gen_date = l.login_date


Alternatively, and with a massive speed improvement on my query, you could investigate this blog entry, which does what my code above does, but will work across all versions of SQL.

He explains it more there, but the SQL is:

DECLARE @LowDate DATETIME
SET @LowDate = '01-01-2006'

DECLARE @HighDate DATETIME
SET @HighDate = '12-31-2016'

SELECT DISTINCT DATEADD(dd, Days.Row, DATEADD(mm, Months.Row, DATEADD(yy, Years.Row, @LowDate))) AS Date
FROM
(SELECT 0 AS Row UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14
 UNION ALL SELECT 15 UNION ALL SELECT 16 UNION ALL SELECT 17 UNION ALL SELECT 18 UNION ALL SELECT 19
 UNION ALL SELECT 20 UNION ALL SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23 UNION ALL SELECT 24
 UNION ALL SELECT 25 UNION ALL SELECT 26 UNION ALL SELECT 27 UNION ALL SELECT 28 UNION ALL SELECT 29
 UNION ALL SELECT 30 -- add more years here...
) AS Years
INNER JOIN
(SELECT 0 AS Row UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
 UNION ALL SELECT 10 UNION ALL SELECT 11
) AS Months
ON DATEADD(mm, Months.Row,  DATEADD(yy, Years.Row, @LowDate)) <= @HighDate 
INNER JOIN
(SELECT 0 AS Row UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14
 UNION ALL SELECT 15 UNION ALL SELECT 16 UNION ALL SELECT 17 UNION ALL SELECT 18 UNION ALL SELECT 19
 UNION ALL SELECT 20 UNION ALL SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23 UNION ALL SELECT 24
 UNION ALL SELECT 25 UNION ALL SELECT 26 UNION ALL SELECT 27 UNION ALL SELECT 28 UNION ALL SELECT 29
 UNION ALL SELECT 30
) AS Days
ON DATEADD(dd, Days.Row, DATEADD(mm, Months.Row,  DATEADD(yy, Years.Row, @LowDate))) <= @HighDate
WHERE DATEADD(yy, Years.Row, @LowDate) <= @HighDate
ORDER BY 1
Jonathan