views:

54

answers:

2

I'm trying to improve the search functionality on my web forums. I've got a table of posts, and each post has (among other less interesting things):

  • PostID, a unique ID for the individual post.
  • ThreadID, an ID of the thread the post belongs to. There can be any number of posts per thread.
  • Text, because a forum would be really boring without it.

I want to write an efficient query that will search the threads in the forum for a series of words, and it should return a hit for any ThreadID for which there are posts that include all of the search words. For example, let's say that thread 9 has post 1001 with the word "cat" in it, and also post 1027 with the word "hat" in it. I want a search for cat hat to return a hit for thread 9.

This seems like a straightforward requirement, but I don't know of an efficient way to do it. Using the regular FREETEXT and CONTAINS capabilities for N'cat AND hat' won't return any hits in the above example because the words exist in different posts, even though those posts are in the same thread. (As far as I can tell, when using CREATE FULLTEXT INDEX I have to give it my index on the primary key PostID, and can't tell it to index all posts with the same ThreadID together.)

The solution that I currently have in place works, but sucks: maintain a separate table that contains the entire concatenated post text of every thread, and make a full text index on THAT. I'm looking for a solution that doesn't require me to keep a duplicate copy of the entire text of every thread in my forums. Any ideas? Am I missing something obvious?

+1  A: 

As far as i can see there is no "easy" way of doing this.

I would create a stored procedure which simply splits up the search words and starts looking for the first word and put the threadid's in a table variable. Then you look for the other words (if any) in the threadids you just collected (inner join).

If intrested i can write a few bits of code but im guessing you wont need it.

Fabian
This is the best "smart" solution I could come up with too. I already parse the user's input myself (so they don't have to put "AND" between each word, and to sanitize things a bit), so I could just do the search in multiple stages, narrowing the results a little bit for each search term.I don't really know how "smart" it is as I'm not much of a database guy, but it certainly seems preferable to my current solution. I've already got a pseudocode sproc for this floating around in my head.
Morbo
A: 

What are you searching for? CAT HAT as a complete word, in which case:

CONTAINS(*,'"CAT HAT")

CAT OR HAT then..

CONTAINS (*,'CAT OR HAT')

Searching for "CAT HAT" and expecting just the post with CAT in doesn't make any sense. If the problem is parsing what the user types, you could just replace SPACES with OR (to search any of the words, AND if both required). The OR will give you both posts for thread 9.

SELECT DISTINCT ThreadId
FROM Posts 
WHERE CONTAINS (*,'"CAT OR HAT")

Better still you could , if it helps, use the brilliant irony (http://irony.codeplex.com/) which translates (parses) a search string into a Fulltext query. Might help for you.

Requires the use of google syntax for the original search which can only be a good thing as most people are used to typing in google searches.

Plus here is an article on how to use it. http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/

Barry King
Perhaps my question wasn't clear; I didn't want to be too wordy. I've got CONTAINS() queries working fine right now. The user types "cat hat" and I translate that to "'cat AND hat'" for use in CONTAINS(). What I need, and don't get from a basic CONTAINS(), is the ability to search for the terms given across multiple rows (posts) that are treated as a single unit (thread). If, out of all posts in the table with a single thread ID, all of the search terms are present somewhere in those posts, even if not all in the same post, the query should return a hit for that thread ID.
Morbo
Do you mean CAT can be in one post and HAT in another? If thats the case then isnt your query "CAT OR HAT"?.
Barry King
CAT could be in one and HAT could be in the other, but all words must be present in a single thread for it to be a match.
Morbo