views:

80

answers:

4

I have a table called Registrations with the following fields:

  • Id
  • DateStarted (not null)
  • DateCompleted (nullable)

I have a bar chart which shows the number of registrations started and completed by date. My query looks like:

;
WITH Initial(DateStarted, StartCount)
as (
    select Datestarted, COUNT(*)
    FROM Registrations
    GROUP BY DateStarted 
)
select I.DateStarted, I.StartCount, COUNT(DISTINCT R.RegistrationId) as CompleteCount
    from Initial I
     inner join Registrations R
      ON (I.DateStarted = R.DateCompleted)
    GROUP BY I.DateStarted, I.StartCount

which returns a table that looks like:

DateStarted  StartCount  CompleteCount
2009-08-01   1033        903
2009-08-02   540         498

The query just has one of those code smell problems. What is a better way of doing this?

A: 

I don't see a problem. I see a common table expression being used.


You didn't provide DDL for the tables, so I'm not going to try to reproduce this. However, I think you may be able to directly substitute the SELECT for the use of Initial.

John Saunders
+1  A: 

EDIT: so why wont the below work? you could throw coalesce() statements around the counts in the last select statement if you wanted to make the counts zero instead of null. it will also include dates that have completed (or ended in the example below) registrations even though that date doesn't have started registrations.


I am assuming the following table structure (roughly).

create table temp
(
 id int,
 start_date datetime,
 end_date datetime
)

insert into temp values (1, '8/1/2009', '8/1/2009')
insert into temp values (2, '8/1/2009', '8/2/2009')
insert into temp values (3, '8/1/2009', null)
insert into temp values (4, '8/2/2009', '8/2/2009')
insert into temp values (5, '8/2/2009', '8/3/2009')
insert into temp values (6, '8/2/2009', '8/4/2009')
insert into temp values (7, '8/4/2009', null)

Then you could do the following to get what you want.

with start_helper as
(
 select start_date, count(*) as count from temp group by start_date
),

end_helper as
(
 select end_date, count(*) as count from temp group by end_date
)

select coalesce(a.start_date, b.end_date) as date, a.count as start_count, b.count as end_count
from start_helper a full outer join end_helper b on a.start_date = b.end_date
where coalesce(a.start_date, b.end_date) is not null

I would think the full outer join is necessary since a record can be completed today that started yesterday but we may have not started a new record today so you would lose a day from your results.

thomas
It looks like it would work, but it ultimately executes 3 select queries which is not exactly what I was looking for. I went ahead and upvoted you. Thanks.
Chris Lively
+1  A: 

Off-hand, I think this does it:

SELECT
 DateStarted
 , COUNT(*) as StartCount
 , SUM(CASE 
  WHEN DateCompleted = DateStated THEN 1
  ELSE 0 END
  ) as CompleteCount

FROM Registration

GROUP BY DateStarted


OK, apparently I had the requirements wrong before. Given that the CompleteCounts are independent of the StartDate, then this is how I would do it:

;WITH StartDays AS
(
 SELECT DateStarted
 , Count(*) AS CompleteCount 
 FROM Registration 
 GROUP BY DateStarted
)
, CompleteDays AS
(
 SELECT DateCompleted
 , Count(*) AS StartCount 
 FROM Registration 
 GROUP BY DateCompleted
)
SELECT
 DateStarted
 , COALESCE(StartCount, 0) AS StartCount
 , COALESCE(CompleteCount, 0) AS CompleteCount

FROM StartDays
FULL OUTER JOIN CompleteDays ON DateStarted = DateCompleted

Which actually is pretty close to what you had.

RBarryYoung
That looks nice, but produces different numbers. Specifically, the CompleteCount is wrong. It's low by a couple hundred.
Chris Lively
And it's only wrong on the last record.
Chris Lively
Do you have some data that demonstrates the problem? Because AFAIK, this is the technically correct answer to your question and all of my test's return the correct answers for my code. I have to conclude therefore that there is some ambiguous boundary condition, implicitly handled one way by your code and with different results in mine.
RBarryYoung
Yeah, that's about the same.. Upvoted for the confirmation that I'm pretty much doing this how it has to be done. Thanks.
Chris Lively
A: 

I believe the following is identical in function to what you have:

select DS.DateStarted
  , count(distinct DS.RegistrationId) as StartCount
  , count(distinct DC.RegistrationId) as CompleteCount
from Registrations DS
inner join Registrations DC on DS.DateStarted = DC.DateCompleted
group by Ds.DateStarted

I'm a bit confused by the name of the column DateStarted in the results. It looks to just be a date where both some things started and some things ended. And the counts are the number or registrations started and completed that day.

The inner join is throwing away any date where there is either 0 starts or 0 completes. To get all:

select coalesce(DS.DateStarted, DC.DateCompleted) as "Date"
  , count(distinct DS.RegistrationId) as StartCount
  , count(distinct DC.RegistrationId) as CompleteCount
from Registrations DS
full outer join Registrations DC on DS.DateStarted = DC.DateCompleted
group by Ds.DateStarted, DC.DateCompleted

If you wanted to include dates that are neither DateStarted nor DateCompleted, with counts of 0 and 0, then you will need a source of dates and I think it would be clearer to use two correlated sub-queries in select clause instead of joins and count distinct:

select DateSource."Date"
    , (select count(*)
        from Registrations
        where DateStarted = DateSource."Date") as StartCount
    , (select count (*)
        from Registrations
        where DateCompleted = DateSource."Datge") as CompleteCount
from DateSource -- implementation of date source left as exercise
where DateSource.Date between @LowDate and @HighDate
Shannon Severance