I consider myself rather proficient with T-SQL and I'm usually able to optimize a query pretty good without loosing readability. In short: I like my SQL short, descriptive, declarative and elegant.
While the following code works, i have two problems with it:
- I am using cursors and I can't shake the feeling I have in the back of my head that it could have been done more efficiently using CTEs. Also cursors don't work in views, so I can't manipulate results/ranges on the client-side or in dependent SQL.
- The code is implemented in a Stored Procedure, which leads to the same problem as above. Especially with LInQ to SQL and auto-paging.
So given the following SP, does anyone see any obvious way to convert this to a plain select using recursive CTEs? I've tried, failed and thought I'd see what the stack overflow community might be able to come up with.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROC [dbo].[usp_GetLastReferers]
(
@Limit int = NULL
)
AS
BEGIN
SET NOCOUNT ON
CREATE TABLE #Referer
(
ID int,
Url nvarchar(500),
Referer nvarchar(500)
)
DECLARE @ID int
DECLARE @Url nvarchar(500)
DECLARE @Referer nvarchar(500)
DECLARE @Count int
SET @Count = 0
DECLARE LogCursor CURSOR FORWARD_ONLY READ_ONLY FOR
SELECT ID, Url, Referer FROM Log WHERE Referer <> '' ORDER BY ID DESC
OPEN LogCursor
FETCH NEXT FROM LogCursor INTO @ID, @Url, @Referer
WHILE @@FETCH_STATUS = 0 AND (@Count < @Limit OR @Limit IS NULL)
BEGIN
DECLARE @Hits int
SELECT @Hits = COUNT(*)
FROM #Referer
WHERE Referer = @Referer
DECLARE @IsLocal bit
SELECT @IsLocal = dbo.IsLocalSite(@Referer)
IF (@Hits = 0 OR @Hits IS NULL) AND @IsLocal = 0
BEGIN
INSERT INTO #Referer(ID,Url,Referer) VALUES (@ID,@Url,@Referer)
SET @Count = @Count + 1
END
FETCH NEXT FROM LogCursor INTO @ID, @Url, @Referer
END
CLOSE LogCursor
DEALLOCATE LogCursor
SELECT *
FROM #Referer
DROP TABLE #Referer
SET NOCOUNT OFF
END
Since it may not be totally obvious, what I'm trying to do here is akin tothe following quasi SQL
SELECT DISTINCT TOP(@Limit) ID, Url, Referer
FROM Log
ORDER BY ID DESC
Basically to get the last unique refers (not unique rows), which often contain duplicates, and in descending order. This is definitely where it gets tricky.
The data is pretty simple HTTP logs. ID field is just a unique row-identifier, Url is the full url requesten, and Referer is the HTTP referer for that request. None of the values can be null, but referer can be empty (ie ''). IsSiteLocal is just a simple filtering function to exclude referers originating from my own sites.
If anyone wants some sample-data to full around, I can upload a small DB-backup so you have something to fool around with.
Sample-data can be found here: http://svada.kjonigsen.net/files/IISLogsDBBackup.zip