views:

138

answers:

2

Can you recommend a full-text search engine? (Preferably open source)

I have a database of many (though relatively short) HTML documents. I want users to be able to search this database by entering one or more search words in my C++ desktop application. Hence, I’m looking for a fast full-text search solution to integrate with my app. Ideally, it should:

  • Skip common words, such as the, of, and, etc.
  • Support stemming, i.e. search for run also finds documents containing runner, running and ran.
  • Be able to update its index in the background as new documents are added to the database.
  • Be able to provide search word suggestions (like Google Suggest)
  • Have a well-documented API

To illustrate, assume the database has just two documents:

Document 1: This is a test of text search.

Document 2: Testing is fun.

The following words should be in the index: fun, search, test, testing, text. If the user types t in the search box, I want the application to be able to suggest test, testing and text (Ideally, the application should be able to query the search engine for the 10 most common search words starting with t). A search for testing should return both documents.

Other points:

  • I don't need multi-user support
  • I don't need support for complex queries
  • The database resides on the user's computer, so the indexing should be performed locally.

Can you suggest a C or C++ based solution? (I’ve briefly reviewed CLucene and Xapian, but I’m not sure if either will address my needs, especially querying the search word indexes for the suggest feature).

+2  A: 

I have use with very success the dtSearch module.

They have a dll, that you can use with your application to index just anything and do more than the one you ask.

Note: Is not free.

I do not see in question that you ask for free one, so I write my favor one. The dtSearch have inspire me and I create an indexer for my language Ellinika for my sites, because did not found what I was looking for my language.

There are some modules just for steeming if you just need to find suggestions for your words, I have get reference from here http://tartarus.org/~martin/PorterStemmer/

For example if you have a database like ms sql that all ready do some basic indexing, and some one search for a word, and you do not find nothing, you can do by your self steeming on this word, and search again...

Aristos
Note: it's not free.
Computer Guru
A: 

You can use Clucene for c/c++ and sphider for php. both are free but take time to setup and use, but not difficult to understand.

gurpreet