views:

100

answers:

2

I'm trying to create an online search for a particular set of literature/quotes/etc from a spiritual organization, and while the number of items (mostly PDF, HTML, or plain text) that can be searched is limited, I wanted to provide comprehensive search filters (Kayak.com style).

That means my data will need to be organized in such a way that it's easy to filter by author name, source type (whether it's a book, speech, quote, etc), when, and where, and other filters. That means, each literature item will have to have this "additional information" tied with it.

My question is, how do I go about building this search engine? I have heard of Lucerine, and also recently discovered Searcharoo, a .NET library for searching, which will index all my PDF files located in a local directory.

What I'm wondering is if I should use Searcharoo, or if I should simply create my own database which stores the filepath, and query a column that contains the text of the PDF file. Or can I use Searcharoo, or something similar, and still be able to tag each indexed file with additional information stored in the DB? Or should I take a completely different approach?

I'd appreciate any input on this...

Thanks!

A: 

I've heard CouchDB was designed for this but honestly I've never used it before.

oykuo
A: 

I've used Lucene.NET for making full-text indexes that contain additional metadata. It's stable, quick, and reasonably well documented, if you don't mind using a Java port.

dthrasher