views:

283

answers:

3

I'm using Sql Server 2008 FullText Search for a project. I need to be able to search PDf files, and I had some questions relating to that:

  1. How do I enable PDF searching? I've heard of the adobe filter, but couldn't find a clear guide on how to get started.

  2. Are the PDF files stored in the DB itself, or in the file system? I was mainly concerned about the space on shared hosting services like DiscountASP. Typically, we get only about 100MB of space for the DB, but a lot more (in GBs) for the File System. So, if these PDF files are going to be stored directly in the DB, then it may get expensive, right?

  3. I would like to provide snippets of the search results (like Google). How can I achieve this with Sql Server 2008 FTS?

+1  A: 
  1. You need a PDF IFilter. Here's the one from Foxit Software.
  2. I believe you can only use 'Sql Server Full Text Search" if the PDF files are stored within the database.
  3. I haven't found a way to do this other than opening the file and searching for the context myself for each result.
David
+2  A: 

Sounds like you want to use Microsoft Indexing Services

This will index files on the file system so you can search their contents.

Here is an example of querying indexing services using ASP.NET

kerchingo
is indexing services used together with the fulltext search?
Prabhu
+1  A: 

Full text search can only search database content. It will not index content outside the database. Fulltext is extensible through a programming API and Adobe has providers for PDF content, as you already know. SQL Fulltext can use those providers.

However there is another feature you may be interested in, namely the new SQL 2008 FILESTREAM data type. Filestreams are stored in the file system as files but are maintained as part of the database from the point of view of transaction consitency, backup and restore etc. Luckly FILESTREAM and FULL TEXT work together.

Remus Rusanu
So I've already setup my database for FullText search (not PDF, just plain text). Now to add the filestream piece, how do I get started? Any example available?Thanks...
Prabhu
Follow the steps at http://msdn.microsoft.com/en-us/library/bb933995.aspx
Remus Rusanu