views:

183

answers:

2

I need to develop an IFilter for Microsoft Search Server 2008 that performs prolonged computations to extract text. Extracting text from one file can take from 5 seconds to 12 hours. How can I desing such an IFilter so that the daemon doesn't reset it on timeout and also other IFilters can be reset on timeout if they hang up?

A: 

I have not actually developed any filters yet, so I'm basically just guessing, but the way I always understood things is that the IFilter is chunk-based for exactly this reason. It's up to the filter implementation to make sure the returned chunks are "small enough", so the calling search daemon can simply quit in between two chunks if things are taking too long.

Apparently, my assumption is wrong, or you would not be asking this very question.

Paul-Jan
+2  A: 

12 hours, wow!

If it takes that long and there are many files, your best option would be to create a pre-processing application that would extract the text and make it available for the iFilter to access.

Another option would be to create html summaries of the documents and instruct the crawler to index those. If the summary page could easily link to the document itself if necessary.

Nat