views:

240

answers:

2

I'm working on a search stored procedure for our existing forums.

I've written the following code which uses standard SQL full text indexes, however I'm sure there is a better way of doing it and would like a point in the right direction.

To give some info on how it needs to work, The page has 1 search text box which when clicked will search thread titles, thread descriptions and post text and should return the results with the title matches first, then descriptions then post data.

Below is what I've written so far which works but is not elegant or as fast as I would like. To give an example of performance with 20K threads and 80K posts it takes about 12 seconds to search for 5 random words.

ALTER PROCEDURE [dbo].[SearchForums]
(
    --Input Params
    @SearchText VARCHAR(200),
    @GroupId INT = -1,
    @ClientId INT,
    --Paging Params
    @CurrentPage INT,
    @PageSize INT,           
    @OutTotalRecCount INT OUTPUT
)
AS

--Create Temp Table to Store Query Data
CREATE TABLE #SearchResults
(
    Relevance INT IDENTITY,
    ThreadID INT,
    PostID INT,
    [Description] VARCHAR(2000),
    Author BIGINT
)

--Create and populate table of all GroupID's This search will return from
CREATE TABLE #GroupsToSearch
(
GroupId INT
)
IF @GroupId = -1
    BEGIN
     INSERT INTO #GroupsToSearch
     SELECT GroupID FROM SNetwork_Groups WHERE ClientId = @ClientId
    END
ELSE
    BEGIN
     INSERT INTO #GroupsToSearch
     VALUES(@GroupId)
    END

--Get Thread Titles
INSERT INTO #SearchResults
    SELECT 
     SNetwork_Threads.[ThreadId],
     (SELECT NULL) AS PostId,
     SNetwork_Threads.[Description],
     SNetwork_Threads.[OwnerUserId]
    FROM 
     SNetwork_Threads
     INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId  
    WHERE 
     FREETEXT(SNetwork_Threads.[Description], @SearchText) AND
     Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
     SNetwork_Groups.ClientId = @ClientId


--Get Thread Descriptions
INSERT INTO #SearchResults
    SELECT 
     SNetwork_Threads.[ThreadId],
     (SELECT NULL) AS PostId,
     SNetwork_Threads.[Description],
     SNetwork_Threads.[OwnerUserId]
    FROM 
     SNetwork_Threads
     INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId  
    WHERE 
     FREETEXT(SNetwork_Threads.[Name], @SearchText) AND
     Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
     SNetwork_Groups.ClientId = @ClientId


--Get Posts
INSERT INTO #SearchResults
    SELECT 
     SNetwork_Threads.ThreadId,
     SNetwork_Posts.PostId,
     SNetwork_Posts.PostText,
     SNetwork_Posts.[OwnerUserId]
    FROM 
     SNetwork_Posts 
     INNER JOIN SNetwork_Threads ON SNetwork_Threads.ThreadId = SNetwork_Posts.ThreadId
     INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId  
    WHERE 
     FREETEXT(SNetwork_Posts.PostText, @SearchText) AND
     Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
     SNetwork_Groups.ClientId = @ClientId


--Return Paged Result Sets
SELECT @OutTotalRecCount =  COUNT(*) FROM #SearchResults
SELECT  
    #SearchResults.[ThreadID],
    #SearchResults.[PostID],
    #SearchResults.[Description],
    #SearchResults.[Author]
FROM  
    #SearchResults   
WHERE  
    #SearchResults.[Relevance] >= (@CurrentPage - 1) * @PageSize + 1 AND 
    #SearchResults.[Relevance] <= @CurrentPage*@PageSize
ORDER BY Relevance ASC


--Clean Up
DROP TABLE #SearchResults
DROP TABLE #GroupsToSearch

I know its a bit long winded but just a nudge in the right direction would be well appreciated.

Incase it helps 80% of the query time is taken up when search posts and according to teh query plan is spent on "Clustered Index Scan" on the posts table. I cant see anyway around this.

Thanks

Gavin

A: 

I'd really have to see an explain plan to know where the slow parts were, as I don't see anything particularly nasty in your code. Very first thing - make sure all your indexes are in good shape, they are being used, statistics are up to date, etc.

One other idea would be to do the search on thread title first, then use the results from that to prune the searches on thread description and post text. Similarly, use the results from the thread description search to prune the post text search.

The basic idea here is that if you find the keywords in the thread title, why bother searching the description and posts? I realize this may not work depending on how you are presenting the search results to the user, and it may not make a huge difference, but it's something to think about.

Eric Petroelje
I've edited my post, Thanks for the advice, Will definatly look at prunning the thread description results. Incase it helps 80% of the query time is spent search posts and is spent on "Clustered Index Scan" on the posts table. I dont think I can avoid this on a table of 80K records?
Gavin Draper
Nope, probably not much you can do to avoid that. You might want to look at IOs for that though - do you have enough memory on your server to load that index, or will it be paging parts in and out of memory?
Eric Petroelje
A: 

Hey:

80k records isn't that much. I'd recommend not inserting the resulting data into your temp table, and instead only inserting the IDs, then joining to that table afterward. This will save on writing to the temp table, as you may store 10000 ints, instead of 10000 full posts (of which you discard all but one page of). This may reduce the amount of time spent scanning posts, as well.

It looks like you would need two temp tables, one for threads and one for posts. You would union them in the final select.

Toby