views:

257

answers:

4

I am checking website entrys that are recorded in a database

columns: browser, click_type_id, referrer, and datetime

if multiple rows have the same browser, click_type_id, and referrer and are timestamped (occur within 1 minute of one another) they are considered a duplicate.

I need a sql statement that can query for these duplicates based on the above criteria.

Any help is appreciated.

+1  A: 

To prevent inserts

INSERT MyTable (browser, click_type_id, referrer, [datetime])
SELECT
    @browser, @click_type_id, @referrer, @datetime
WHERE
    NOT EXISTS (SELECT *
        FROM
           MyTable M2
        WHERE
           browser = @browser AND click_type_id = @click_type_id AND referrer = @referrer
           AND
           [datetime] < DATEADD(minute, -1, @datetime))

To find in existing data (relies on smalldatetime accuracy and may help to avoid issues as per comment to question)

SELECT
   browser, click_type_id, referrer, COUNT(*)
FROM
   MyTable
GROUP BY
    browser, click_type_id, referrer, (CAST [datetime] AS smalldatetime)
HAVING
    COUNT(*) > 1
gbn
Wouldn't the smalldatetime conversion be a problem if you had, for example, 12:00:25 and 12:01:14?
Tom H.
@Tom H.: Yes, but soemtimes it depends how you define a minute ;-)
gbn
+1  A: 

Try this:

SELECT *, Case when DATEDIFF(minute, @StartDate, @EndDate) < 1 Then 1 else 0 AS Duplicate from [table]
ryanulit
eh? and how does this work with 2 upvotes?
gbn
Yea I wrote that hastily haha. Surprised to see it got upvoted. Problem is that you have to throw the dates in as parameters and it won't do the check automatically.
ryanulit
+5  A: 
SELECT
     T1.browser,
     T1.click_type,
     T1.referrer,
     T1.datetime,
     T2.datetime
FROM
     My_Table T1
INNER JOIN My_Table T2 ON
     T2.browser = T1.browser AND
     T2.click_type = T1.click_type AND
     T2.referrrer = T1.referrer AND
     T2.datetime > T1.datetime AND
     T2.datetime <= DATEADD(mi, 1, T1.datetime)
Tom H.
You could also just take off the t2 browser, click_type, and referrer references in the select since they will always equal t1's.
ryanulit
Good point. It's done
Tom H.
A: 

Do you want to return the duplicates or return distinct values (without duplicates)?

Gabriel McAdams