Can you recommend a full-text search engine? (Preferably open source)
I have a database of many (though relatively short) HTML documents. I want users to be able to search this database by entering one or more search words in my C++ desktop application. Hence, I’m looking for a fast full-text search solution to integrate with my app. Ideally, it should:
- Skip common words, such as
the
,of
,and
, etc. - Support stemming, i.e. search for
run
also finds documents containingrunner
,running
andran
. - Be able to update its index in the background as new documents are added to the database.
- Be able to provide search word suggestions (like Google Suggest)
- Have a well-documented API
To illustrate, assume the database has just two documents:
Document 1:
This is a test of text search.
Document 2:
Testing is fun.
The following words should be in the index: fun
, search
, test
, testing
, text
. If the user types t
in the search box, I want the application to be able to suggest test
, testing
and text
(Ideally, the application should be able to query the search engine for the 10 most common search words starting with t
). A search for testing
should return both documents.
Other points:
- I don't need multi-user support
- I don't need support for complex queries
- The database resides on the user's computer, so the indexing should be performed locally.
Can you suggest a C or C++ based solution? (I’ve briefly reviewed CLucene and Xapian, but I’m not sure if either will address my needs, especially querying the search word indexes for the suggest feature).