views:

3337

answers:

4

I've got a strange problem with indexing PDF files in SQL Server 2005, and hope someone can help. My database has a table called MediaFile with the following fields - MediaFileId int identity pk, FileContent image, and FileExtension varchar(5). I've got my web application storing file contents in this table with no problems, and am able to use full-text searching on doc, xls, etc with no problems - the only file extension not working is PDF. When performing full-text searches on this table for words which I know exist inside of PDF files saved in the table, these files are not returned in the search results.

The OS is Windows Server 2003 SP2, and I've installed Adobe iFilter 6.0. Following the instructions on this blog entry, I executed the following commands:

exec sp_fulltext_service 'load_os_resources', 1;
exec sp_fulltext_service 'verify_signature', 0;

After this, I restarted the SQL Server, and verified that the iFilter for the PDF extensions is installed correctly by executing the following command:

select document_type, path from sys.fulltext_document_types where document_type = '.pdf'

This returns the following information, which looks correct:

document_type: .pdf
path: C:\Program Files\Adobe\PDF IFilter 6.0\PDFFILT.dll

Then I (re)created the index on the MediaFile table, selecting FileContent as the column to index and the FileExtension as its type. The wizard creates the index and completes successfully. To test, I'm performing a search like this:

SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(*, '"house"');

This returns DOC files which contain this term, but not any PDF files, although I know that there are definitely PDF files in the table which contain the word house.

Incidentally, I got this working once for a few minutes, where the search above returned the correct PDF files, but then it just stopped working again for no apparent reason.

Any ideas as to what could be stopping SQL Server 2005 from indexing PDF's, even though Adobe iFilter is installed and appears to be loaded?

A: 

I've just struggled with it for an hour, but finally got it working. I did everything you did, so just try to simplify the query (I replaced * with field name and removed double quotes on term):

SELECT MediaFileId, FileExtension FROM MediaFile WHERE CONTAINS(FileContent, 'house')

Also when you create full text index make sure you specify the language. And the last thing is maybe you can try to change the field type from Image to varbinary(MAX).

+2  A: 

Thanks Ivan. Managed to eventually get this working by starting everything from scratch. It seems like the order in which things are done makes a big difference, and the advice given on the linked blog to to turn off the 'load_os_resources' setting after loading the iFilter probably isn't the best option, as this will cause the iFilter to not be loaded when the SQL Server is restarted.

If I recall correctly, the sequence of steps that eventually worked for me was as follows:

  1. Ensure that the table does not have an index already (and if so, delete it)
  2. Install Adobe iFilter
  3. Execute the command exec sp_fulltext_service 'load_os_resources', 1;
  4. Execute the command exec sp_fulltext_service 'verify_signature', 0;
  5. Restart SQL Server
  6. Verify PDF iFilter is installed
  7. Create full-text index on table
  8. Do full re-index

Although this did the trick, I'm quite sure I performed these steps a few times before it eventually started working properly.

Mun
A: 

Thanks i follow this steps and Works ¡¡¡

A: 

Thansk for this Mun. I've followed your steps and got it working better than it did, although not all PDF files are being returned. Odd!

Ian Devlin
Hmm, I had a dodgy PDF file. All works now!
Ian Devlin