views:

45

answers:

2

Currently in the middle of building a knowledge base app and am a bit unsure on the best way to store and index the document information.

The user uploads the document and when doing so selects a number of options from dropdown lists (such as category,topic,area..., note these are not all mandatory) they also enter some keywords and a description of the document. At the moment the category (and others) selected is stored as foreign key in the documents table using the id from the categories table. What we want to be able to do is do a FREETEXTTABLE or CONTAINSTABLE on not only the information within the varchar(max) column where the document is located but also on the category name, topic name and area name etc.

I looked at the option of creating an indexed view but this wasn't possible due to the LEFT JOIN against the category column. So I'm not sure how to go about being able to do this any ideas would be most appreciated.

A: 

I assume that you want to AND the two searches together. For example find all documents containing the text "foo" AND in category the "Automotive Repair".

Perhaps you don't need to full text the additional data and can just use = or like? If the additional data is reasonably small it may not warrant the complication of full text.

However, if you want to use full text on both, use a stored procedure that pulls the results together for you. The trick here is to stage the results rather than trying to get a result set back straight away.

This is rough starting point.

-- a staging table variable for the document results
declare @documentResults table (
    Id int,       
    Rank int
)

insert into @documentResults
select d.Id, results.[rank]
from containstable (documents, (text), '"foo*"') results
inner join documents d on results.[key] = d.Id

-- now you have all of the primary keys that match the search criteria
-- whittle this list down to only include keys that are in the correct categories

-- a staging table variable for each the metadata results
declare @categories table (
    Id int        
)

insert into @categories
select results.[KEY]
from containstable (Categories, (Category), '"Automotive Repair*"') results

declare @topics table (
    Id int        
)

insert into @topics
select results.[KEY]
from containstable (Topics, (Topic), '"Automotive Repair*"') results

declare @areas table (
    Id int        
)

insert into @areas
select results.[KEY]
from containstable (Areas, (Area), '"Automotive Repair*"') results


select d.text, c.category, t.topic, a.area
from @results r
inner join documents d on d.Id = r.Id
inner join @categories c on c.Id = d.CategoryId
inner join @topics t on t.Id = d.TopicId
inner join @areas a on a.Id = d.AreaId
Michael Baker
A: 

You could create a new column for your full text index which would contain the original document plus the categories appended as metadata. Then a search on that column could search both the document and the categories simultaneously. You'd need to invent a tagging system that would keep them unique within your document yet the tags would not be likely to be used as search phrases themselves. Perhaps something like:

This is my regular document text. <FTCategory: Automotive Repair> <FTCategory: Transmissions>
Joe Stefanelli
As always, I don't mind a down vote if I'm wrong. However, I do dislike an anonymous down vote with no explanation. Can you explain your objection?
Joe Stefanelli
I don't storing the same data twice is an acceptable way to go as essentially doubling the size over every document in the database
Gazeth