views:

154

answers:

4

Hi,

I'm looking to add full text indexing to a Linux desktop application written in C++. I am thinking that the easiest way to do this would be to call an existing library or utility. This article reviews various open source utilities available for the Gnome and KDE desktops; metatracker, recoll and stigi are all written in C++ so they each seem reasonable. But I cannot find any notable documentation on how to use them as libraries or through an API. I could, instead, use something like Clucene or Xapian, which are generic full text indexing libraries. They seem more straightforward but if I used them, I'd have to implement my own indexing daemon, an unappealing prospect.

Also, Xesam seems to be the latest thing, does anyone have any evidence that it works?

So, does anyone have experience using any of the applications or libraries? How did you use it and what documentation was useful?

A: 

Create/Open/Close/Delete pipe to something like grep

RocketSurgeon
That's a pretty silly suggestion. grep is to indexing like toenail clippers are to a lawnmower.
Carl Smotricz
+2  A: 

I used CLucene, which you mentioned (and also Lucene.NET), and found it to be pretty good.

John Zwinck
What did you use it for?
Joe Soul-bringer
A commercial project involving autocompletion in text entry fields. Not a web application.
John Zwinck
A: 

There's also Strigi which AFAIK works with Xesam and is the default used in KDE.

Milliams
A: 

After further looking around, I found and worked with Recol. It believe that it has the best C++ interface to a full text search engine, in this case Xapian.

It is important to realize that clucene and Xapian are both highly complex libraries designed primarily for multi-user server applications. Cutting them down to a level appropriate for a client-system is not easy. If I remember correctly, Strigi has a complex, pure C interface which isn't adapted.

Clucene also doesn't seem to be that actively maintained currently and Xapian seems to be maintained. But the thing is the existence of recol, which allows you to index particular files without the massive, massive setup that raw Xapian or clucene requires - creating your own "stemming" set is not normally desirable, etc.

Joe Soul-bringer