views:

321

answers:

2

I have an application where I allow users to upload files, mainly PDF and Word documents. These files are stored in a varbinary field in the database. For what it is worth, I need to have these files available regardless of how the user is accessing the application, via Web or Windows Forms application or any other Presentation layer.

Is there a way to search the raw text contents of these fields? For example, if I upload a resume, I would like the user to be able to search C# and be able to look in the contents of varbinary field for the specified text.

Also, if there is a better strategy for handling this, I am open to it.

+3  A: 

I would say that using SQL Server is the wrong tool for the job (search-wise) as it can't natively parse through the text stored in a binary document.

I suggest looking in to something like Lucene.NET (the .NET port of the Lucene Search Engine...originally written in Java) which will allow you to easily search through your documents after they've been uploaded.

You should be able to architect a solution that allows you to retain your document storage in SQL Server but use Lucene.NET to index and search the documents that you have stored there.

Justin Niessner
So it sounds like I would need to have (2) copies of the file, one stored in the db, and another stored as a file that could be indexed. I wonder how SharePoint does their searching?
mattruma
You wouldn't need to have two copies of the file. Lucene.NET (with a little help) should be able to index the copies in the database. SharePoint does something similar with Windows Search Services.
Justin Niessner
+2  A: 

You need a layer of some code to extract the type and have knowledge of the format. To SQL, it's just raw data

gbn