tags:

views:

59

answers:

3

I'm trying to find duplicate "keys" so that they can be addressed and made into proper, unique keys.

I recently learned that a HAVING clause can filter the results of an aggregate query by targeting the results of a GROUP BY. You GROUP BY the alleged "key" and HAVING where the count is > 1, and there are your problem rows.

My question is, what is the equivalent of this for windowing functions?

The following table should only be atomic to name and month, but it's using a date field that is detailed to the day (i.e. something can appear to happen twice or more times in a month when it should only be monthly).

select
  event_id,
  overly_specific_date,
  count(*) over(partition by event_id, substring(convert(char(8), overly_specific_date), 0, 7))
from events_historic
order by over(partition by event_id, substring(convert(char(8), overly_specific_date), 0, 7))

vs

select
  event_id,
  count(*)
from events_historic
group by event_id, substring(convert(char(8), overly_specific_date), 0, 7)
having count(*) > 1

The first query is good because it shows what I want, but I'd like to filter it down. I know I could do it in a larger query or a CTE, but I'm looking for something concise like HAVING. The second query uses HAVING, but it no longer displays one part of the key, overly_specific_date.

How can I filter the second query?

A: 

CTE version:

WITH events AS (
      SELECT t.event_id,
             COUNT(*) 'num'
        FROM EVENTS_HISTORIC t
    GROUP BY e.event_id, YEAR(t.date), MONTH(t.date), DAY(t.date)
      HAVING COUNT(*) > 1)
SELECT eh.event_id,
       eh.date,
       e.num  
  FROM EVENTS_HISTORIC eh
  JOIN events e ON e.event_id = eh.event_id

Non CTE version:

SELECT eh.event_id,
       eh.date,
       e.num  
  FROM EVENTS_HISTORIC eh
  JOIN (SELECT t.event_id,
               COUNT(*) 'num'
          FROM EVENTS_HISTORIC t
      GROUP BY e.event_id, YEAR(t.date), MONTH(t.date), DAY(t.date)
        HAVING COUNT(*) > 1) e ON e.event_id = eh.event_id
OMG Ponies
So much for concise. =)
Mark Canlas
A: 

Your problem is that overly_specific_date varies across a group (you are aggregating by a month version of the date), therefore it's not possible to display the overly_specific_date because a single value does not exist for the group. To list all offending dates you have to implement some sort of subquery as posed by rexem, to link the group to the different dates.

However, a cheap hack that might serve your purpose is to select out the MIN/MAX of overly_specific_date in your original query, to show the offending date range which is turning up. (You could also just dump the month version in a MIN statement if that's all you wanted.)

Joel Goodwin
A: 

Hi Mark,

I'd recommend a CTE, but since you asked, there is a sneaky way to do this using TOP (1) WITH TIES:

select top (1) with ties
  event_id,
  overly_specific_date,
  count(*) over (
    partition by event_id,
    substring(convert(char(8), overly_specific_date), 0, 7)
  ) as ct
from events_historic
order by 
  case when count(*) over (
    partition by event_id,
    substring(convert(char(8), overly_specific_date), 0, 7)
  ) > 1 then 0 else 1 end;

This doesn't generalize to all that many other useful situations, but I think in your case it will work.

Steve Kass