views:

137

answers:

6

Hi, I have below click_log table logging hits for some urls

site    ip    ua    direction   hit_time
-----------------------------------------------------
 1      127.0.0.1      1         20010/01/01 00:00:00

 2      127.0.0.1      1         20010/01/01 00:01:00

 3      127.0.0.1      0         20010/01/01 00:10:00

....    .........

I want to select incoming hits (direction:1) and group by sites that are:

  • from same ip and browser
  • logged within 10 minutes of each other
  • occured more than 4 times in 10 minutes.

I'm not sure if above was clear enough. English is not my first language. Let me try to explain with an example.

If site 1 gets 5 hits from same ip and browser with in 10 minutes after getting first unique hit from that ip and browser i want it to be included in the selection.

Basically I am trying to find abusers.

A: 

You're probably looking for the Between operator as described here:

http://www.w3schools.com/sql/sql_between.asp

ho1
A: 

What about

SELECT IP, (SELECT COUNT(*) FROM Click_Log WHERE Click_Log.IP = CL.IP 
     AND DIRECTION = 1 AND DATEDIFF(MINUTE, ClickLog.HIT_TIME, CL.HIT_TIME)
     BETWEEN -10 AND 10) AS CLICK_COUNT
FROM Click_Log CL
WHERE DIRECTION = 1 AND CLICK_COUNT > 4
Chris Bednarski
+2  A: 

I think this does what you need. I have included some sample data too.

Create Table #t
(
[Site] int,
IP varchar(20),
Direction int,
Hit_Time datetime
)

Insert Into #t
Values (1,'127.0.0.1',1,'2010-01-01 00:00:00')

Insert Into #t
Values (1,'127.0.0.1',1,'2010-01-01 00:01:00')

Insert Into #t
Values (1,'127.0.0.1',1,'2010-01-01 00:03:00')

Insert Into #t
Values (1,'127.0.0.1',1,'2010-01-01 00:04:00')


Insert Into #t
Values (2,'127.0.0.2',1,'2010-01-01 00:00:00')

Insert Into #t
Values (2,'127.0.0.2',1,'2010-01-01 00:01:00')

Insert Into #t
Values (2,'127.0.0.2',0,'2010-01-01 00:03:00')

Insert Into #t
Values (2,'127.0.0.2',1,'2010-01-01 00:04:00')


Select Distinct Site
From #t
Where Direction = 1
Group by Site, IP
Having (DateDiff(minute,Min(HIt_Time), max(hit_time)) <= 10) And Count(*) >= 4

Drop Table #t
Barry
Thanks, this looks ok but it grupus by site and ip so as result you get site listed more than once. apart from that it seems working fine
nLL
Barry
No i only need site listed, basically i need to find sites that abused so i can go in to details. When i remove ip from select and gorup by results include single hits too
nLL
In that case, you need to have only Site listed in the Select and Site and IP listed in the Group By
Barry
That doesn't change the result; I still get duplicate sites in result. Don’t get me wrong, I am not complaining
nLL
if i wrap your query with Select * FROM () group by siteI get what i want. That should be ok shouldn't it?
nLL
It should be fine - as long as you fully test it and are happy with the results. Without having your data to test with it is quite difficult for me to amend my query.
Barry
I've updated my answer which should solve your problem.
Barry
There is a problem: Let site 1 has 5 hits with direction 1 from IP 127.0.0.1 in times: 15:00, then 15:30, 15:31, 15:32, 15:33. *It has 4 hits within 10 minutes interval* (15:30, 15:31, 15:32, 15:33 - 4 clicks in 3 minutes), *but difference between MAX and MIN* (= 15:33 - 15:00) *is 33* - **site is abused by mistake**!
SergeanT
Ah yes - I see your problem. I've been trying to think of way to get around but I haven't come up with anything yet. I'll try to come back to you soon.
Barry
Hi, i have added a new answer; which I hopes solves the problem! :-)
Barry
A: 

@nLL could you add the expected output to your question? Perhaps extend the data sample Barry added so we can see exactly what you're after.

Mark Storey-Smith
A: 
;WITH rankings AS (
    SELECT *, DENSE_RANK() OVER(ORDER BY [site], ip, ua) groupId,
        ROW_NUMBER() OVER(PARTITION BY [site], ip, ua ORDER BY hit_time) sequence
    FROM Hits
    WHERE direction = 1),
periods AS (
    SELECT r.groupId, r.sequence, count(*) hitCount
    FROM rankings r
    LEFT OUTER JOIN rankings r2
        ON r2.groupId = r.groupId and r2.sequence < r.sequence
        AND r2.hit_time >= DATEADD(second, -10*60, r.hit_time)
        AND r2.hit_time < r.hit_time
    GROUP BY r.groupId, r.sequence
),
groups AS (
    SELECT p.groupId, MAX(p.hitCount) maxHitCount
    FROM periods p
    GROUP BY p.groupId
)
SELECT DISTINCT r.[site], r.ip, r.ua, g.maxHitCount 
FROM rankings r
INNER JOIN groups g ON g.groupId = r.groupId
WHERE maxHitCount >= 5
ORDER BY maxHitCount DESC
Anthony Faull
A: 

I have added this answer in response to the OP comment.

I've used the following test data:

Create Table dbo.Temp
(
[Site] int,
IP varchar(20),
Direction int,
Hit_Time datetime
)

Insert Into dbo.Temp
Values (1,'127.0.0.1',1,'2010-01-01 00:00:00')

Insert Into dbo.Temp
Values (1,'127.0.0.1',1,'2010-01-01 00:01:00')

Insert Into dbo.Temp
Values (1,'127.0.0.1',1,'2010-01-01 00:03:00')

Insert Into dbo.Temp
Values (1,'127.0.0.1',1,'2010-01-01 00:04:00')


Insert Into dbo.Temp
Values (2,'127.0.0.2',1,'2010-01-01 15:00:00')

Insert Into dbo.Temp
Values (2,'127.0.0.2',1,'2010-01-01 15:31:00')

Insert Into dbo.Temp
Values (2,'127.0.0.2',1,'2010-01-01 15:32:00')

Insert Into dbo.Temp
Values (2,'127.0.0.2',1,'2010-01-01 15:33:00')

Insert Into dbo.Temp
Values (2,'127.0.0.2',1,'2010-01-01 15:34:00')

First you need to create a Function to do the working out:

Create Function dbo.fn_CheckSuspectActivity (@Site int, @IP varchar(20), @MinDate datetime,
                                                    @MaxDate datetime, @Direction int, @Interval int,
                                                    @MaxCount int)

                                                    returns int
        as begin
        Declare @OrigMaxDate datetime,
                @IsSuspect int

        Set @OrigMaxDate = @MaxDate
        Set @IsSuspect = 0

        if (DATEDIFF(minute, @MinDate, @MaxDate) > 10)
                --Min and Max dates for site & Ip
                -- are more than 10 minutes apart
                begin
                        --Loop through the records
                        While (@MaxDate <= @OrigMaxDate And @IsSuspect = 0)
                        begin
                                -- Set The MaxDate to the MinDate plus 10 mins
                                Set @MaxDate = DATEADD(Minute, 10, @MinDate)

                                If (Select COUNT(*) 
                                    From dbo.Temp
                                    Where Site = @Site
                                    And IP = @IP
                                    And Hit_Time >= @MinDate
                                    And Hit_Time <= @MaxDate
                                    And Direction = @Direction
                                    ) >= @MaxCount

                                        Begin
                                                -- Hit Count exceeded for the specified 10 min range
                                                set @IsSuspect = 1
                                        End

                                    Else

                                        Begin
                                        -- Set the minDate to the maxDate 
                                        Set @MinDate = @MaxDate
                                        --Add another 10 minutes on
                                        Set @MaxDate = DATEADD(minute, 10,@MaxDate)

                                        End

                        end
                        -- We've  finished the loop but if @IsSuspect is still zero we need to do one final check
                        if (@IsSuspect = 0)
                            begin
                                    -- Check the number of records based on the last MinDate used
                                    -- and the original MaxDate

                                    If (Select COUNT(*) 
                                    From dbo.Temp
                                    Where Site = @Site
                                    And IP = @IP
                                    And Hit_Time >= @MinDate
                                    And Hit_Time <= @OrigMaxDate
                                    And Direction = @Direction
                                    ) >= @MaxCount
                                            begin
                                                    -- Hit Count exceeded for the specified 10 min range
                                                    set @IsSuspect = 1
                                            end
                                        else
                                            begin
                                                    set @IsSuspect = 0
                                            end

                            end

                end

            else
                -- Time difference isn't more than 10 minutes so do a "normal" check
                begin

                        If (Select COUNT(*)
                            From dbo.Temp 
                            Where Site = @Site
                            And IP = @IP
                            And Hit_Time >= @MinDate
                            And Hit_Time <= @MaxDate
                            And Direction = @Direction) >= @MaxCount

                            BEGIN   -- Its a suspect IP
                                    Set @IsSuspect = 1
                            END

                                ELSE

                            BEGIN
                                    -- It's ok
                                    Set @IsSuspect = 0
                            END

                end


return @IsSuspect

End
Go

Then this select statement should give you the correct answer:

With Qry as
(

Select  Site,
        IP,
        MIN(Hit_Time) as'MinTime',
        MAX(Hit_TIme) as 'MaxTime'

From dbo.Temp
Group By Site, IP
)

Select Site
From Qry
Where dbo.fn_CheckSuspectActivity(Site, IP, MinTime, MaxTime, 1, 10, 4) = 1
-- function params are as follows: Site Number, IP Address, FirstTimeLogged, 
--                                  LastTimeLogged, Direction, IntervalToCheck, MaxOccurences

If the first and last dates are less than 10 mins apart then it checks if they have exceed the hit count. If first date and last date are more than 10 mins apart it increments the first date by intervals of 10 mins and checks to see if they have exceeded the hitcount during that 10 min period.

I hope this is what you need.

Barry

Barry